(Evgueni A. Haroutunian, Mariam E. Haroutunian, As

Reliability Criteria in
Information Theory
and in Statistical
Hypothesis Testing
Reliability Criteria in
Information Theory
and in Statistical
Hypothesis Testing
Evgueni A. Haroutunian
National Academy of Sciences of Armenia
Republic of Armenia
eghishe@sci.am
Mariam E. Haroutunian
National Academy of Sciences of Armenia
Republic of Armenia
armar@ipia.sci.am
Ashot N. Harutyunyan
Universität Duisburg-Essen
Germany
ashot@iem.uni-due.de
Boston – Delft
Foundations and Trends
R
in
Communications and Information Theory
Published, sold and distributed by:

now Publishers Inc.
PO Box 1024
Hanover, MA 02339
USA
Tel. +1-781-985-4510
www.nowpublishers.com
sales@nowpublishers.com
Outside North America:

now Publishers Inc.
PO Box 179
2600 AD Delft
The Netherlands
Tel. +31-6-51115274
The preferred citation for this publication is E. A. Haroutunian, M. E. Haroutunian

and A. N. Harutyunyan, Reliability Criteria in Information Theory and in Statistical
Hypothesis Testing, Foundations and Trends R
in Communications and Information
Theory, vol 4, nos 2–3, pp 97–263, 2007
ISBN: 978-1-60198-046-5
c 2008 E. A. Haroutunian, M. E. Haroutunian
and A. N. Harutyunyan
All rights reserved. No part of this publication may be reproduced, stored in a retrieval
system, or transmitted in any form or by any means, mechanical, photocopying, recording
or otherwise, without prior written permission of the publishers.
Photocopying. In the USA: This journal is registered at the Copyright Clearance Cen-
ter, Inc., 222 Rosewood Drive, Danvers, MA 01923. Authorization to photocopy items for
internal or personal use, or the internal or personal use of specific clients, is granted by
now Publishers Inc for users registered with the Copyright Clearance Center (CCC). The
‘services’ for users can be found on the internet at: www.copyright.com
For those organizations that have been granted a photocopy license, a separate system
of payment has been arranged. Authorization does not extend to other kinds of copy-
ing, such as that for general distribution, for advertising or promotional purposes, for
creating new collective works, or for resale. In the rest of the world: Permission to pho-
tocopy must be obtained from the copyright owner. Please apply to now Publishers Inc.,
PO Box 1024, Hanover, MA 02339, USA; Tel. +1-781-871-0245; www.nowpublishers.com;
now Publishers Inc. has an exclusive license to publish this material worldwide. Permission
to use this content must be obtained from the copyright license holder. Please apply to now
Publishers, PO Box 179, 2600 AD Delft, The Netherlands, www.nowpublishers.com; e-mail:
Foundations and Trends R
in
Volume 4 Issue 2–3, 2007
Editorial Board
Editor-in-Chief:
Sergio Verdú
Depart of Electrical Engineering
Princeton University
Princeton, New Jersey 08544
Editors
Venkat Anantharam (UC. Berkeley) Amos Lapidoth (ETH Zurich)

Ezio Biglieri (U. Torino) Bob McEliece (Caltech)
Giuseppe Caire (U. Sounthern Neri Merhav (Technion)
California) David Neuhoff (U. Michigan)
Roger Cheng (U. Hong Kong) Alon Orlitsky (UC. San Diego)
K.C. Chen (Taipei) Vincent Poor (Princeton)
Daniel Costello (U. Notre Dame) Kannan Ramchandran (UC.
Thomas Cover (Stanford) Berkeley)
Anthony Ephremides (U. Maryland) Bixio Rimoldi (EPFL)
Andrea Goldsmith (Stanford) Shlomo Shamai (Technion)
Dave Forney (MIT) Amin Shokrollahi (EPFL)
Georgios Giannakis (U. Minnesota) Gadiel Seroussi (MSRI)
Joachim Hagenauer (TU Munich) Wojciech Szpankowski (Purdue)
Te Sun Han (Tokyo) Vahid Tarokh (Harvard)
Babak Hassibi (Caltech) David Tse (UC. Berkeley)
Michael Honig (Northwestern) Ruediger Urbanke (EPFL)
Johannes Huber (Erlangen) Steve Wicker (Cornell)
Hideki Imai (Tokyo) Raymond Yeung (Hong Kong)
Rodney Kennedy (Canberra) Bin Yu (UC. Berkeley)
Sanjeev Kulkarni (Princeton)
Editorial Scope

in Communications and Informa-
tion Theory will publish survey and tutorial articles in the following
topics:
• Coded modulation • Multiuser detection

• Coding theory and practice • Multiuser information theory
• Communication complexity • Optical communication channels
• Communication system design • Pattern recognition and learning
• Cryptology and data security • Quantization
• Data compression • Quantum information processing
• Data networks • Rate-distortion theory
• Demodulation and Equalization • Shannon theory
• Denoising • Signal processing for
• Detection and estimation communications
• Information theory and statistics • Source coding
• Information theory and computer • Storage and recording codes
science • Speech and Image Compression
• Joint source/channel coding • Wireless Communications
• Modulation and signal design
Information for Librarians

in Communications and Information Theory, 2007,
Volume 4, 6 issues. ISSN paper version 1567-2190. ISSN online version 1567-
2328. Also available as a combined paper and online subscription.
in
Vol. 4, Nos. 2–3 (2007) 97–263
c 2008 E. A. Haroutunian, M. E. Haroutunian
and A. N. Harutyunyan
DOI: 10.1561/0100000008
Reliability Criteria in Information Theory

and in Statistical Hypothesis Testing
Evgueni A. Haroutunian1 ,
Mariam E. Haroutunian2 and
Ashot N. Harutyunyan3
1
Inst. for Informatics and Automation Problems, National Academy of
Sciences of Armenia, Yerevan, Republic of Armenia, eghishe@sci.am
2
Inst. for Informatics and Automation Problems, National Academy of
Sciences of Armenia, Yerevan, Republic of Armenia, armar@ipia.sci.am
3
AvH Fellow, Inst. für Experimentelle Mathematik, Universität
Duisburg-Essen, Essen, Germany, ashot@iem.uni-due.de
To the memory of Roland Dobrushin the outstanding scientist and

wonderful teacher.
Abstract
This survey is devoted to one of the central problems of Information
Theory — the problem of determination of interdependence between
coding rate and error probability exponent for different information
transmission systems. The overview deals with memoryless systems
of finite alphabet setting. It presents material complementary to the
contents of the series of the most remarkable in Information Theory
books of Feinstain, Fano, Wolfowitz, Gallager, Csiszar and Körner,
Kolesnik and Poltirev, Blahut, Cover and Thomas and of the papers
by Dobrushin, Gelfand and Prelov.
We briefly formulate fundamental notions and results of Shan-
non theory on reliable transmission via coding and give a survey of
results obtained in last two-three decades by the authors, their col-
leagues and some other researchers. The paper is written with the
goal to make accessible to a broader circle of readers the theory of
rate-reliability. We regard this concept useful to promote the noted
problem solution in parallel with elaboration of the notion of reliability-
reliability dependence relative to the statistical hypothesis testing and
identification.
Preface
This monograph is devoted to one of the central problems of Infor-

mation Theory — the problem of determination of interdependence
between coding rate and error probability exponent for different infor-
mation transmission systems. The overview deals with memoryless sys-
tems of finite alphabet setting. It presents material complementary to
the contents of the series of the most remarkable in Information Theory
books.
We briefly formulate fundamental notions and results of Shan-
non theory on reliable transmission via coding and give a survey of
results obtained in last two–three decades by coauthors, their col-
leagues, and some other researchers. The review was written with the
goal to make accessible to a broader circle of readers the concept of
rate-reliability. We regard this concept useful to promote the noted
problem solution in parallel with elaboration of the notion of reliability–
reliability dependence relative to the statistical hypothesis testing and
identification.
The authors are grateful to R. Ahlswede, V. Balakirsky, P. Har-
remöes, N. Cai for their useful inputs to the earlier versions of the
manuscript.
ix
x Preface
The comments and suggestions of S. Shamai (Shitz), G. Kramer,

and the anonymous reviewers are highly appreciated.
The participation of our colleagues P. Hakobyan and S. Tonoyan in
the manuscript revision was helpful.
A. Harutyunyan acknowledges the support by the Alexander von
Humboldt Foundation to his research at the Institute for Experimental
Mathematics, Essen University.
Contents
1 Introduction 1
1.1 Information Theory and Problems of Shannon Theory 1

1.2 Concepts of Reliability Function
and of Rate-Reliability Function 2
1.3 Notations for Measures of Information
and Some Identities 5
1.4 Basics of the Method of Types 7
2 E-capacity of the Discrete Memoryless Channel 9

2.1 Channel Coding and Error Probability:
Shannon’s Theorem 9
2.2 E-capacity (Rate-Reliability Function) of DMC 16
2.3 Sphere Packing Bound for E-capacity 16
2.4 Random Coding Bound for E-capacity 20
2.5 Expurgated Bound for E-capacity 24
2.6 Random Coding and Expurgated Bounds Derivation 25
2.7 Comparison of Bounds for E-capacity 29
3 Multiuser Channels 33
3.1 Two-Way Channels 33
3.2 Interference Channels 38
3.3 Broadcast Channels 43
3.4 Multiple-Access Channels 47
xi
xii Contents
4 E-capacity of Varying Channels 57

4.1 Bounds on E-capacity for the Compound Channels 57
4.2 Channels with Random Parameter 59
4.3 Information Hiding Systems 62
4.4 Multiple-Access Channels with Random Parameter 66
4.5 Arbitrarily Varying Channels 74
5 Source Coding Rates Subject to Fidelity

and Reliability Criteria 77
5.1 Introductory Notes 77
5.2 The Rate-Reliability-Distortion Function 78
5.3 Proofs, Covering Lemma 85
5.4 Binary Hamming Rate-Reliability-Distortion Function 90
5.5 Reliability Criterion in AVS Coding 97
6 Reliability Criterion in Multiterminal

Source Coding 103
6.1 Robust Descriptions System 103

6.2 Cascade System Coding Rates 112
6.3 Reliability Criterion in Successive Refinement 117
7 Logarithmically Asymptotically Optimal Testing

of Statistical Hypotheses 129
7.1 Prelude 129
7.2 Reliability Function for Two Alternative Hypotheses 130
7.3 Multiple Hypotheses: Interdependence of Reliabilities 136
7.4 Optimal Testing and Identification for Statistical
Hypothesis 141
Basic Notations and Abbreviations 153
References 157
1
Introduction
1.1 Information Theory and Problems of Shannon Theory

The fundamental problem of communication is that
of reproducing at one point either exactly or approx-
imately a message selected at an other point.
Claude Shannon, 1948
Information Theory as a scientific discipline originated from the

landmark work “A mathematical theory of communication” [191] of
an American genius engineer and mathematician Claude E. Shannon
in 1948, and thereafter exists as a formalized science with more than
a half century life. In the Guest Editorial [215] to “Commemorative
issue 1948–1998” of IEEE Transactions on Information Theory Ser-
gio Verdú certified: “With communication engineering in the epicenter
of the bombshell, the sensational aftermath of Shannon’s paper soon
reached Mathematics, Physics, Statistics, Computing, and Cryptology.
Even Economics, Biology, Linguistics, and other fields in the natural
and social sciences felt the ripples of Shannon’s new theory.” In his wise
retrospective [82] on the founder’s life and scientific heritage Robert
Gallager wrote: “Claude E. Shannon invented information theory and
1
2 Introduction
provided the concepts, insights, and mathematical formulations that

now form the basis for modern communication technology. In a sur-
prisingly large number of ways, he enabled the information age.” The
exceptional role of Claude Shannon in development of modern science
was noted earlier by Andrey Kolmogorov in Preface to Russian edition
of Shannon’s “Works on Information Theory and Cybernetics” [197]
and by Roland Dobrushin in Preface of Editor to the Russian transla-
tion of the book by Csiszár and Körner [51].
In [191] and another epochal work [194] Shannon mathematically
addressed the basic problems in communications and gave their solu-
tions, stating the three fundamental discoveries underlying the infor-
mation theory concerning the transmission problem via noisy channel
and its inherent concept — capacity, data compression with the central
role of entropy in that, and source coding under fidelity criterion with
specification of the possible performance limit in terms of the mutual
information introduced by him.
Under the term “Shannon Theory” it is generally accepted now to
mean the subfield of information theory which deals with the estab-
lishment of performance bounds for various parameters of transmission
systems.
The relevant sections of this review treat noted fundamental results
and go further in generalizations and solutions of those problems toward
some classical and more complicated communication situations, focus-
ing on the results and methodology developed mainly in the works of
coauthors related to the role of the error probability exponent as a
characteristic in the mathematical model of an information transmis-
sion system.
Taking into account the interconnection of the statistical, proba-
bilistic, and information theoretical problems we hereby add results
also on the error exponent (reliability function) investigation in statis-
tical hypotheses testing models.
1.2 Concepts of Reliability Function

and of Rate-Reliability Function
Important properties of each communication channel are charac-
terized by the reliability function E(R), which was introduced by
1.2 Concepts of Reliability Function and of Rate-Reliability Function 3
Shannon [195], as the optimal exponent of the exponential decrease

exp{−N E(R)}
of the decoding error probability, when code length N increases, for
given transmission rate R less than capacity C of the channel [191].
In an analogous sense one can characterize various communication sys-
tems. The reliability function E(R) is also called the error probability
exponent. Besides, by analogy with the concept of the rate-distortion
function [26, 194], the function E(R) may be called the reliability-rate
function.
There is a large number of works devoted to studying of this function
for various communication systems. Along with achievements in this
part of Shannon theory a lot of problems have remained unsolved.
Because of principal difficulty of finding the reliability function for the
whole range of rates 0 < R < C, this problem is completely solved only
in rather particular cases. The situation is typical when obtained upper
and lower bounds for the function E(R) coincide only for rates R in
some interval, say Rcrit < R < C, where Rcrit is the rate, for which the
derivative of E(R) by R equals −1.
It is desirable to create a more harmonious general theory and
more effective methods of usable bounds construction for new classes
of more complicated information transmission systems. It seems, that
the approach developed by the authors is fruitful for this purpose.
It consists in studying the function R(E) = C(E), inverse to E(R)
[98, 100, 102]. This is not a simple mechanical permutation of roles of
independent and dependent variables, since the investigation of opti-
mal rates of codes, ensuring when N increases the error probability
exponential decrease with given exponent (reliability) E, can be more
expedient than the study of the function E(R).
At the same time, there is an analogy with the problem from coding
theory about bounding of codes optimal volume depending on their cor-
rection ability. This allows to hope for profitable application of results
and methods of one theory in the other. The definition of the function
C(E) is in natural conformity with Shannon’s notions of the channel
capacity C and of the zero-error capacity C0 [152]. When E increases
from zero to infinity the function C(E) decreases from C to C0 (it is
4 Introduction
so, if C0 > 0, in the other case C(E) = 0 when E is great enough). So,
by analogy with the definition of the capacity, this characteristic of
the channel may be called E-capacity. From the other side the name
rate-reliability function is also logical. One of the advantages of our
approach is the convenience in study of the optimal rates of source
codes ensuring given exponential decrease of probability of exceeding
the given distortion level of messages restoration. This will be the rate-
reliability-distortion function R(E, ∆, P ) inverse to exponent function
E(R, ∆, P ) by Marton [171]. So the name shows which dependence of
characteristics is in study. Later on, it is possible to consider also other
arguments, for example, coding rates on the other inputs of channel
or source, if their number is greater than one. This makes the theory
more well-proportioned and comprehensible.
Concerning methods for the bounds construction, it is found that
the Shannon’s random coding method [191] of proving the existence of
codes with definite properties, can be applied with the same success for
studying of the rate-reliability function. For the converse coding the-
orem type upper bounds deduction (so called sphere packing bounds)
E. Haroutunian proposed a simple combinatorial method [98, 102],
which one can apply to various systems. This method is based on the
proof of the strong converse coding theorem, as it was in the method
put forth in [99] and used by other authors [35, 51], and [152] for deduc-
tion of the sphere packing bound for the reliability function. Moreover,
derivation of the upper bound of C(E) by passage to limit for E → ∞
comes to be the upper bound for the zero-error capacity C0 .
We note the following practically useful circumstance: the compar-
ison of the analytical form of writing of the sphere packing bound for
C(E) with expression of the capacity C in some cases gives us the pos-
sibility to write down formally the bound for each system, for which the
achievable rates region (capacity) is known. In rate-reliability-distortion
theory an advantage of the approach is the technical ease of treatment
of the coding rate as a function of distortion and error exponent which
allows to convert readily the results from the rate-reliability-distortion
area to the rate-distortion ones looking at the extremal values of the
reliability, e.g., E → 0, E → ∞. That fact is especially important when
one deals with multidimensional situation. Having solved the problem
1.3 Notations for Measures of Information and Some Identities 5
of finding the rate-reliability-distortion region of a multiterminal sys-

tem, the corresponding rate-distortion one can be deduced without an
effort.
In literature we know an early attempt to consider the concept
of E-capacity R(E). In [51] (Section 2.5) Csiszár and Körner men-
tion the concept of “generalized capacity” for DMC as “the capacity”
corresponding to tolerated probability of error exp{−N E} (i.e., the
largest R with E(R) ≥ E). But they limited themselves with consid-
eration (problem 15, Section 2.5) only of the case E ≤ Ecr (W ), where
Ecr (W ) = E(Rcr (W )). In some of the earlier works the rate-reliability
function was also considered (for e.g., Fu and Shen [77], Tuncel and
Rose [206] and Chen [42]).
E. A. Haroutunian and M. E. Haroutunian [116] have been teaching
the concept of E-capacity in Yerevan State University for many years.
1.3 Notations for Measures of Information

and Some Identities
Here we introduce our notations for necessary characteristics of
Shannon’s entropy and mutual information and Kullback–Leibler’s
divergence.
In the review finite sets are considered, which are denoted by
U, X , Y, S, . . . . The size of the set X is denoted by |X |. Random variables
(RVs) with values in U, X , Y, S, . . . are denoted by U, X, Y, S, . . . . Proba-
bility distributions (PDs) are denoted by Q, P, V, W, P V, P ◦ V, . . . . Let
M
PD of RV X be P = {P (x), x ∈ X }, and V be conditional PD of RV Y
for given value x
M
V = {V (y|x), x ∈ X , y ∈ Y},
joint PD of RV X and Y be
M
P ◦ V = {P ◦ V (x, y) = P (x)V (y|x), x ∈ X , y ∈ Y},
and PD of RV Y be
( )
M
X
PV = P V (y) = P (x)V (y|x), y ∈ Y .
x∈X
6 Introduction
The set of messages to be transmitted are denoted by M and its

cardinality by M .
We use the following notations (here and in the sequel all log-s and
exp-s are of base 2): for entropy of RV X with PD P :
M
X
HP (X) = − P (x) log P (x),
x∈X
for entropy of RV Y with PD P V :
M
X
HP,V (Y ) = − P V (y) log P V (y),
y∈Y
for joint entropy of RV X and Y :

M
X
HP,V (X, Y ) = − P ◦ V (x, y) log P ◦ V (x, y),
x∈X ,y∈Y
for conditional entropy of RV Y relative to RV X:

M
X
HP,V (Y |X) = − P V (y) log V (y|x),
x∈X ,y∈Y
for mutual information of RV X and Y :

M
X V (y|x)
IP,V (X ∧ Y ) = IP,V (Y ∧ X) = P (x)V (y|x) log ,
P V (y)
x∈X ,y∈Y
for conditional mutual information of RV X and Y relative to

M M M
RV U with PD Q = {Q(u), u ∈ U}, P = {P (x|u), u ∈ U, x ∈ X }, V =
{V (y|x, u), u ∈ U, x ∈ X , y ∈ Y},
IQ,P,V (X ∧ Y |U )
M
X V (y|x, u)
= Q(u)P (x|u)V (y|x, u) log ,
P V (y|u)
u∈U ,x∈X ,y∈Y
for informational divergence of PD P and PD Q on X :

M
X P (x)
D(P kQ) = P (x) log ,
Q(x)
x∈X
and for informational conditional divergence of PD P ◦ V and PD
M
P ◦ W on X × Y, where W = {W (y|x), x ∈ X , y ∈ Y}:
M
X V (y|x)
D(P ◦ V kP ◦ W ) = D(V kW |P ) = P (x)V (y|x) log .
W (y|x)
x∈X ,y∈Y
1.4 Basics of the Method of Types 7
The following identities are often useful
D(P ◦ V kQ ◦ W ) = D(P kQ) + D(V kW |P ),

HP,V (X, Y ) = HP (X) + HP,V (Y |X)
= HP,V (Y ) + HP,V (X|Y ),
IP,V (Y ∧ X) = HP,V (Y ) − HP,V (Y |X)
= HP (X) + HP,V (Y ) − HP,V (X, Y ),
IQ,P,V (Y ∧ X|U ) = HQ,P,V (Y |U ) − HQ,P,V (Y |X, U ),
IQ,P,V (X ∧ Y, U ) = IQ,P,V (X ∧ Y ) + IQ,P,V (X ∧ U |Y )
= IQ,P,V (X ∧ U ) + IQ,P,V (X ∧ Y |U ).
1.4 Basics of the Method of Types

Our proofs will be based on the method of types [49, 51], one of the
important technical tools in Shannon Theory. It was one of Shan-
non’s key notions, called “typical sequence,” that served, developed,
and applied in many works, particularly in the books of Wolfowitz
[222], Csiszár and Körner [51], Cover and Thomas [48], and Yeung
[224]. The idea of the method of types is to partition the set of all N -
length sequences into classes according to their empirical distributions
(types).
The type P of a sequence (or vector) x = (x1 , . . . , xN ) ∈ X N is a
PD P = {P (x) = N (x|x)/N, x ∈ X }, where N (x|x) is the number of
repetitions of symbol x among x1 , . . . , xN . The joint type of x and
y ∈ Y N is the PD {N (x, y|x, y)/N, x ∈ X , y ∈ Y}, where N (x, y|x, y)
is the number of occurrences of symbols pair (x, y) in the pair of
vectors (x, y). In other words, joint type is the type of the sequence
(x1 , y1 ), (x2 , y2 ), . . . , (xN , yN ) from (X × Y)N .
We say that the conditional type of y for given x is PD V =
{V (y|x), x ∈ X , y ∈ Y} if N (x, y|x, y) = N (x|x)V (y|x) for all x ∈ X ,
y ∈ Y. The set of all PD on X is denoted by P(X ) and the sub-
set of P(X ) consisting of the possible types of sequences x ∈ X N is
denoted by PN (X ). The set of vectors x of type P is denoted by TPN (X)
(TPN (X) = ∅ for PD P ∈ / PN (X )). The set of all sequences y ∈ Y N of
conditional type V for given x ∈ TPN (X) is denoted by TP,V N (Y |x) and
8 Introduction
is called V -shell of x. The set of all possible V -shells for x of type P is

denoted by VN (Y, P ).
In the following lemmas very useful properties of types are formu-
lated, for proofs see [49, 51, 63].
Lemma 1.1. (Type counting)
|PN (X )| < (N + 1)|X | , (1.1)

|VN (Y, P )| < (N + 1)|X ||Y| . (1.2)
Lemma 1.2. For any type P ∈ PN (X )
(N + 1)−|X | exp{N HP (X)} < |TPN (X)| ≤ exp{N HP (X)}, (1.3)
and for any conditional type V and x ∈ TPN (X)
(N + 1)−|X ||Y| exp{N HP,V (Y |X)}

N
< |TP,V (Y |x)| ≤ exp{N HP,V (Y |X)}. (1.4)
Lemma 1.3. If x ∈ TPN (X), y ∈ TP,V

N (Y |x), then
QN (x) = exp{−N (HP (X) + D(P kQ)), (1.5)

W N (y|x) = exp{−N (HP,V (Y |X) + D(V kW |P ))}. (1.6)
Some authors frequently apply known facts of the theory of large

deviations [48] for proofs of information-theoretical results. In tuto-
rial [54] Csiszár and Shields deduce results on large deviations using
the method of types. This method helps in a better perception of the
subject because the process of inference in all cases is based on the
examination of the types of vectors. That is why we prefer the usage
of the method of types.
2
E -capacity of the Discrete Memoryless Channel
2.1 Channel Coding and Error Probability:

Shannon’s Theorem
M
Let X , Y be finite sets and W = {W (y|x), x ∈ X , y ∈ Y} be a stochastic
matrix.
Definition 2.1. A discrete memoryless channel (DMC) W with input

alphabet X and output alphabet Y is defined by a stochastic matrix of
transition probabilities
W : X → Y.
An element W (y|x) of the matrix is a conditional probability of receiv-

ing the symbol y ∈ Y on the channel’s output if the symbol x ∈ X is
transmitted from the input.
The model for N actions of the channel W is described by the

stochastic matrix
W N : X N → YN ,
9
10 E-capacity of the Discrete Memoryless Channel
Fig. 2.1 Communication system with noisy channel.
an element of which W N (y|x) is a conditional probability of receiving

the vector y ∈ Y N , when the vector x ∈ X N is transmitted.
We consider only memoryless channels, which operate in each
moment of time independently of the previous and next transmitted
or received symbols, so for all x ∈ X N and y ∈ Y N
N
Y
N
W (y|x) = W (yn |xn ). (2.1)
n=1
The Shannon’s model of two-terminal channel is presented in
Figure 2.1. Let M denotes the set of messages and M be the num-
ber of messages.
Definition 2.2. An N -block code (f, g) for the channel W is a pair of

mappings, where f : M → X N is encoding and g : Y N → M is decoding.
N is called the code length, and M is called the code volume.
Definition 2.3. The probability of erroneous transmission of the mes-

sage m ∈ M by the channel using the code (f, g) is defined as
M
e(m) = W N (Y N − g −1 (m)|f (m)) = 1 − W N (g −1 (m)|f (m)). (2.2)
We consider two versions of error probability of the code (f, g): the
maximal probability of error
M
e(f, g, N, W ) = max e(m),
m∈M (2.3)
M
e(M, N, W ) = min e(f, g, N, W ),
where the minimum is taken over all codes (f, g) of volume M , the
average probability of error for equiprobable messages
M 1
X
e(f, g, N, W ) = e(m), (2.4)
M
m∈M
2.1 Channel Coding and Error Probability: Shannon’s Theorem 11
with
M
e(M, N, W ) = min e(f, g, N, W )
as the minimum average probability between all possible codes of the

length N and the volume M .
It is clear that always
e(f, g, N, W ) ≤ e(f, g, N, W ).
Definition 2.4. The transmission rate of a code (f, g) of the length

N and the volume M is
M 1
R(f, g, N ) = log M. (2.5)
N
The channel coding problem is the following: it is necessary to make

the message set M of the code larger while keeping maximal (or average)
probability of error possibly low. The problem is considered in an asymp-
totic sense when N → ∞.
One of the Shannon’s main discoveries [191] is that he formulated
and for some channels justified the statement which consists of the
following direct and converse parts and now is noted as the Shannon’s
Theorem or the Channel Coding Theorem.
It is possible to characterize each channel W by a number C(W ),

which is called capacity, such that when N → ∞:
for 0 < R < C(W ) there exist codes with M = exp{N R} codewords,
and the average probability of error e(N, M, W ) going to 0,
for R > C(W ) for any code with M = exp{N R} codewords the aver-
age probability of error e(N, M, W ) goes to 1.
Shannon introduced the notion of mutual information and discov-

ered that in the case of DMC W
C(W ) = max IP,W (X ∧ Y ), (2.6)

P
where P = {P (x), x ∈ X } is the PD of input symbols x.

The same result is valid for the case of maximal probability of error.
One of the first proofs of Shannon’s theorem was given in the
works of Feinstein [74, 75], where it was also established that for
R < C(W ) the probability of error tends to zero exponentially with
growing N .
For a given channel W and given rate R the optimal exponent E(R)
of the exponential decrease of the error probability first was considered
by Shannon [195], who called it the reliability function.
Definition 2.5. The reliability function E(R, W ) of the channel W is

defined as
M 1
E(R, W ) = lim − log e(M, N, W ), M = 2N R , 0 < R < C(W ).
N →∞ N
(2.7)
Shannon introduced also the notion of the zero-error capacity

C0 (W ) of the channel [192], which is the smallest upper bound of trans-
mission rates of codes for which beginning from some N there exists a
code (f, g) such that e(f, g, N, W ) = 0.
Thus, for R > C(W ) we have E(R, W ) = 0 and E(R, W ) = ∞ for
0 ≤ R < C0 (W ).
In the case of DMC the capacity and the reliability function of the
channel do not depend asymptotically on consideration of the maximal
or the average error probabilities [222].
For reliability function E(R, W ) of a given DMC W , upper and
lower bounds and their improvements were obtained by Elias [69],
Fano [72], Dobrushin [57], Gallager [78], Forney [76], Shannon et al.
[198], Haroutunian [99], Blahut [34, 35], Csiszár et al. [52], Jelinek [150],
and others.
We shall remind here expressions of the sphere packing bound
Esp (R, W ) for the reliability function of DMC, the random coding
Er (R, W ), and the expurgated bounds Ex (R, W ). These types of bounds
refer to the techniques by which similar bounds were first obtained.
The modern form of writing of the sphere packing bound [51]
Esp (R, W ) was introduced by Haroutunian [99] and independently by
Blahut [34]:
M
Esp (R, P, W ) = min D(V kW |P ), (2.8)
V :IP,V (X∧Y )≤R
M
Esp (R, W ) = max Esp (R, P, W ). (2.9)
P
Theorem 2.1. For any DMC W , for all R ∈ (0, C(W )) the following
inequality takes place
E(R, W ) ≤ Esp (R, W ).
The proofs are in [35, 51, 99, 152], see also [178].
A similar way to (2.8) and (2.9) form of writing of the random cod-
ing bound of reliability function Er (R, W ) was introduced by Csiszár
and Körner [51] and defined as
M
Er (R, P, W ) = min(D(V kW |P ) + |IP,V (X ∧ Y ) − R|+ ),
V
where V runs over the set of all channels V : X → Y and

M
Er (R, W ) = max Er (R, P, W ).
P
The improved lower bound Ex (R, W ), first obtained by Gallager [78]

and called expurgated bound, in formulation of Csiszár and Körner [51]
is the following:
Ex (R, P, W ) = max e + I (X ∧ X)
[EdB (X, X) e − R],
P,V
e
e =P, IP,V (X∧X)≤R
PX =PX
Ex (R, W ) = max Ex (R, P, W ),

P
where IP,V (X ∧ X)e is the mutual information of RV’s X and X,

e such
that P ◦ V (x, x
e) = P (x)V (e x|x), and
M
Xp
e) = − log
dB (x, x W (y|x)W (y|e
x) (2.10)
y∈Y
e both from X .
is the Bhattacharyya distance [35, 50] between x and x
Theorem 2.2. For any DMC W , for R ∈ (0, C(W )) the following
inequality holds
E(R, W ) ≥ max(Er (R, W ), Ex (R, W )).
Theorem 2.3. If the capacity C(W ) of a channel W is positive then

for sufficiently small values of R we have Ex (R, W ) > Er (R, W ).
The smallest value of R, at which the convex curve Esp (R, W ) meets
its supporting line of slope −1, is called the critical rate and denoted
by Rcr .
The comparison of Esp (R, W ) and Er (R, W ) results in
Corollary 2.1. For Rcr ≤ R < C(W ) the reliability function of DMC
W is found exactly: E(R, W ) = Esp (R, W ) = Er (R, W ).
An essential refinement of upper bound for reliability function of

DMC, called linear refinement, was found by Shannon et al. [198].
A list code for a channel W is a code (f, g) such that the range
of g consists of subsets of a fixed number L of messages from M.
For each received vector y the decoder produces a list of L mes-
sages and an error occurs if the sent message is not included in the
obtained list.
List decoding has a theoretical and practical importance. The fol-
lowing remark made by an Unknown Reviewer pertinently emphasizes
the practical usefulness of the list decoding.
Remark 2.1. List-decoding exponents are important because they

reveal that the random-coding and the sphere packing bounds are
essentially tight at all rates for point-to-point coding, as long as we
are willing to live with a small rate-dependent list size while decoding.
This makes the high-reliability low-rate story much more intriguing and
puts the gap with expurgation into new perspective.
Denote by e(N, M, L) the optimal average probability of error for

the list codes of length N with the number of messages M and the list
size L.
As a theoretical tool the list decoding was used by Shannon et al.
in [198] in the inequality
e(N1 + N2 , M, L2 ) ≥ e(N1 , M, L1 )e(N2 , L1 + 1, L2 ) (2.11)
and with its help the above-mentioned refinement of upper bound for
the reliability function was obtained.
The graph of the typical behavior of the reliability function bounds
for the DMC is given in Figure 2.2.
In [100] Haroutunian proved that the reliability function E(R, W )
of DMC is a continuous and strictly monotone function of R for all
R > C0 (W ). This fact follows from the inequality

R1 + R2 Esp (R1 , W ) + E(R2 , W )
E ,W ≤ , (2.12)
2 2
which is valid for all R1 ≤ R2 < C(W ). The inequality (2.12) in turn
follows from the inequality (2.11).
Fig. 2.2 Typical behavior of the bounds for E(R, W ) of DMC W .

The mentioned properties of the reliability function E(R, W ) ensure

the existence of the inverse function to E(R, W ), which we denote by
R(E, W ), name it E-capacity and investigate in several works.
2.2 E-capacity (Rate-Reliability Function) of DMC

The same dependence between E and R as in (2.7) can be investigated
taking the reliability E for an independent variable. Let us consider the
codes, error probabilities of which exponentially decrease with given
exponent E:
e(f, g, N, W ) ≤ exp{−N E}. (2.13)
Denote by M (E, N, W ) the best volume of the code of length N for

channel W satisfying the condition (7.14) for given reliability E > 0.
Definition 2.6. The rate-reliability function, which by analogy with

the capacity we call E-capacity, is
M 1
R(E, W ) = C(E, W ) = lim log M (E, N, W ). (2.14)
N →∞ N
For the DMC it was proved [100] that E(R, W ) is continuous in

R function for R ∈ (C0 (W ), C(W )) and is the inverse of the function
R(E, W ).
As in the case of capacity, E-capacity is called maximal or aver-
age and denoted, correspondingly, C(E, W ) or C(E, W ) depending on
which error probability is considered in (2.14).
It is clear that when ∞ > E > 0
C0 (W ) ≤ C(E, W ) ≤ C(E, W ) ≤ C(W ).
As we shall see later, obtained bounds in the case of DMC do not

depend on the kind (maximal or average) of the error probability.
2.3 Sphere Packing Bound for E-capacity

Now we shall present a very simple method for derivation (it was first
expounded in [102]) of the upper bound, called sphere packing bound
and denoted by Rsp (E, W ), of the E-capacity C(E, W ) for the average
error probability. This bound, as the name shows, is the analogue of
the sphere packing bound (2.8), (2.9) for E(R, W ).
Let V : X → Y be a stochastic matrix. Consider the following
functions
M
Rsp (P, E, W ) = min IP,V (X ∧ Y ),
V :D(V kW |P )≤E
M
(2.15)
Rsp (E, W ) = max Rsp (P, E, W ).
P
Theorem 2.4. For DMC W , for E > 0 the following inequalities hold
C(E, W ) ≤ C(E, W ) ≤ Rsp (E, W ).
Proof. Let E and δ be given such that E > δ > 0. Let the code (f, g)
of length N be defined, R be the rate of the code and average error
probability satisfies the condition
e(f, g, N, W ) ≤ exp{−N (E − δ)},
which according to definitions (2.2) and (2.4) is

1 X N N
W {Y − g −1 (m)|f (m)} ≤ exp{−N (E − δ)}. (2.16)
M m
As the number of messages M can be presented as a sum of numbers of

codewords of different types M = P |f (M) TPN (X)|, and the num-
P T
ber of all types P ∈ PN (X ) is less than (N + 1)|X | (see (1.1)), then

there exists a “major” type P ∗ such that
\
f (M) TPN∗ (X) ≥ M (N + 1)−|X . (2.17)

Now in the left-hand side of (2.16) we can consider only codewords of

type P ∗ and the part of output vectors y of the conditional type V
X
W N {TPN∗ ,V (Y |f (m)) − g −1 (m)|f (m)}
m:f (m)∈TPN∗ (X)
≤ M exp{−N (E − δ)}.
For y ∈ TPN∗ ,V (Y |f (m)) we obtain from (1.6) that

X n \ o
|TPN∗ ,V (Y |f (m))| − |TPN∗ ,V (Y |f (m)) g −1 (m)|
× W N (y|f (m)) ≤ M exp{−N (E − δ)},
or
X M exp{−N (E − δ)}
|TPN∗ ,V (Y |f (m))| −
exp{−N (D(V kW |P ∗ ) + HP ∗ ,V (Y |X))}
X \
TP ∗ ,V (Y |f (m)) g −1 (m) .
N
≤

It follows from the definition of decoding function g that the sets

g −1 (m) are disjoint, therefore
X \
TP ∗ ,V (Y |f (m)) g −1 (m) ≤ |TPN∗ V (Y )|.
N
Then from (1.4) we have

\
f (M) TP ∗ (X) (N + 1)−|X ||Y| exp{N HP ∗ ,V (Y |X)}
N

− M exp{N (D(V kW |P ∗ ) + HP ∗ ,V (Y |X) − E + δ)}

≤ exp{N HP ∗ V (Y )}.
Taking into account (2.17) we come to

exp{N IP ∗ ,V (X ∧ Y )}
M≤ .
(N + 1)−|X |(|Y|+1) − exp(N (D(V kW |P ∗ ) − E + δ))
The right-hand side of this inequality can be minimized by the choice
of conditional type V , keeping the denominator positive, which takes
place for large N when the following inequality holds
D(V kW |P ∗ ) ≤ E − δ.
The statement of Theorem 2.4 follows from the definitions of R(E, W )

and Rsp (E, W ) and from the continuity in E of the function
Rsp (P, E, W ).
Similarly the same bound in the case of maximal error probability

can be proved, but it follows also from the given proof.
Example 2.1. We shall calculate Rsp (E, W ) for the binary symmetric
channel (BSC). Consider BSC W with
X = {0, 1}, Y = {00 , 10 },
W (00 |1) = W (10 |0) = w1 > 0, W (00 |0) = W (10 |1) = w2 > 0.
Correspondingly, for another BSC V on the same X and Y we denote
V (00 |1) = V (10 |0) = v1 , V (00 |0) = V (10 |1) = v2 .
It is clear that w1 + w2 = 1, v1 + v2 = 1.
The maximal value of the mutual information IP,V (X ∧ Y ) in the
definition of Rsp (E, W ) is obtained when p∗ (0) = p∗ (1) = 1/2 because
of symmetry of the channel, therefore
IP ∗ ,V (X ∧ Y ) = 1 + v1 log v1 + v2 log v2 .
The condition D(V kW |P ∗ ) ≤ E will take the following form:

v1 v2
v1 log + v2 log ≤ E.
w1 w2
So, the problem of extremum with restrictions must be solved
(see (2.15))

−(1 + v1 log v1 + v2 log v2 ) =
 max
− v1 log wv11 + v2 log wv22 − E = 0


v1 + v2 = 1.

Using Kuhn–Tukker theorem [152, 230], we find that v 1 and v 2 give

the solution of the problem if and only if there exist λ1 > 0, λ2 > 0,
satisfying the following conditions:
(−1 − v1 log v1 − v2 log v2 ) + λ1 ∂v∂ i (−v1 log wv11 − v2 log wv22 + E)

 ∂
 ∂vi


+ λ2 ∂v∂ i (v1 + v2 − 1) = 0, i = 1, 2,

λ1 v1 log wv11 + v2 log wv22 − E = 0,


which for v 1 and v 2 , giving maximum, are equivalent to

log v i + log e = −λ1 log wvii + log e + λ2 , i = 1, 2,
(
(2.18)
v 1 log wv11 + v 2 log wv22 = E.
Solving the first two equations from (2.18) we obtain
λ1
1
1+λ1 − 1+λ (λ1 −λ2 +1) log e
v i = wi 2 1 , i = 1, 2.
λ1
Let us denote 1+λ 1
= s and remember that v 1 + v 2 = 1, then as func-
tions of parameter s ∈ (0, 1) we get
ws ws
v1 = s 1 s , v2 = s 2 s .
w1 + w2 w1 + w2
From the third condition in (2.18) we obtain the parametric expressions
for E and Rsp (E, W ), namely
w1s w1s−1 w2s w2s−1
E(s) = log + log ,
w1s + w2s w1s + w2s w1s + w2s w1s + w2s
ws ws ws ws
Rsp (s) = 1 + s 1 s log s 1 s + s 2 s log s 2 s .
w1 + w2 w1 + w2 w1 + w2 w1 + w2
It is not complicated to see that we arrived to the same relation
between Rsp and E as that given in Theorem 5.8.3 of the Gallager’s
book [79].
2.4 Random Coding Bound for E-capacity

The result we are going to present is a modification of Theorem 2.5
for E(R, W ) from the book by Csiszár and Körner [51]. Consider the
function called random coding bound for C(E, W )
M
Rr (P, E, W ) = min |IP,V (X ∧ Y ) + D(V kW |P ) − E|+ ,
V :D(V kW |P )≤E
M
Rr (E, W ) = max Rr (P, E, W ). (2.19)
P
Theorem 2.5. For DMC W , for all E > 0 the following bound of
E-capacity holds
Rr (E, W ) ≤ C(E, W ) ≤ C(E, W ).
The proof of this theorem is based on the following modification of

the packing Lemma 5.1 from [51].
Lemma 2.6. For given E > 0, δ ≥ 0, type P ∈ PN (X ) and

+
M = exp N min |IP,V (Y ∧ X) + D(V kW |P ) − E − δ| ,
V :D(V kW |P )≤E
(2.20)
there exist M distinct vectors x(m) from TPN (X),
such that for any
0
m ∈ M, any conditional types V, V , and N large enough, the following
inequality is valid

\ [
0
N N

TP,V (Y |x(m)) TP,V 0 (Y |x(m ))

m6=m0
+
N
(Y |x(m))| exp{−N E − D(V 0 kW |P ) }.

≤ |TP,V (2.21)
Proof of Lemma 2.6. For fixed N, M , and type P , let us consider a

collection of M not necessarily distinct vectors x(m) randomly taken
from TPN (X). Consider the family C(M ) of all such ordered collections
C = (x(1), x(2), . . . , x(M )).
Notice that if some collection C satisfies (2.21) for every m, V , V 0 ,
then x(m) 6= x(m0 ) for m 6= m0 , m = 1, M , m0 = 1, M . To prove that, it
is enough to choose V = V 0 and D(V 0 kW |P ) < E. If V 0 is such that
+
D(V 0 kW |P ) ≥ E, then exp − N E − D(V 0 kW |P ) = 1, and (2.21)

is valid for any M . It remains to prove Lemma 2.6 for V 0 such that
D(V 0 kW |P ) < E. Denote

\ [
0 N N 0

Am (C, V, V ) = TP,V (Y |x(m))
TP,V 0 (Y |x(m ))
m6=m0
and
X X
Am (C) = (N + 1)|X ||Y| Am (C, V, V 0 )
V V 0 :D(V 0 kW |P )<E
0

× exp N (E − D(V kW |P ) − HP,V (Y |X)) .
On account of (1.4) C satisfies (2.21) for every m, V, V 0 , if
Am (C) ≤ 1, m = 1, M . (2.22)
If for some C ∈ C(M )

M
1 X 1
Am (C) ≤ , (2.23)
M 2
m=1
then Am (C) ≤ 1 for at least M/2 vectors x(m) ∈ C. If C 0 is a sub-
collection of C with such vectors then for these m
Am (C 0 ) ≤ Am (C) ≤ 1.
Hence, Lemma 2.6 will be proved if there exists a collection C ∈ C(M ),
satisfying (2.23) with

+ M
I
exp N min P,V (X ∧ Y ) + D(V kW |P ) − E − δ ≤
V :D(V kW |P )≤E 2

1 +
≤ exp N I
min P,V (X ∧ Y ) + D(V kW |P ) − E − δ/2 .
2 V :D(V kW |P )≤E
(2.24)
To prove that (2.23) holds for some C ∈ C(M ), it suffices to show that
for a sequence of M independent RV’s X(M ) = (X(1), X(2), . . . , X(M ))
uniformly distributed over TPN (X) the following inequality holds
EAm (X(M )) ≤ 1/2, m = 1, M .
To this end we observe that

\ [
0 N N 0

EAm (X(M ), V, V ) = E TP,V (Y |X(m))
TP,V 0 (Y |X(m ))
m6=m0
 
X  \ [ 
N N 0
= Pr y ∈ TP,V (Y |X(m)) TP,V 0 (Y |X(m ))

m6=m0

y∈Y N
X X
N N 0

≤ Pr y ∈ TP,V (Y |X(m)) Pr{y ∈ TP,V 0 (Y |X(m ))},
m6=m0 y∈Y N
because X(m) are independent and identically distributed.

Let us note that the first probability is different from zero only if
N (Y ). In this case, we have for N large enough for every fixed
y ∈ TP,V
N (Y )
y ∈ TP,V
N (X|y)|
N
|TP,V
Pr y ∈ TP,V (Y |X(m)) =
|TPN (X)|
≤ (N + 1)|X | exp{−N IP,V (X ∧ Y )}.
The second probability can be estimated in the same way. At last we

obtain

\ [
0
N N

E TP,V (Y |X(m)) TP,V 0 (Y |X(m ))
m6=m0
≤ (N + 1)2|X | (M − 1)|TP,V
N
(Y )|
× exp{−N (IP,V (Y ∧ X) + IP,V 0 (Y ∧ X))}.
From (2.24) for any V 0 such that D(V 0 kW |P ) < E it follows that
M − 1 ≤ exp{N (IP,V 0 (X ∧ Y ) + D(V 0 kW |P ) − E − δ/2},
and we get
EAm (X(M ), V, V 0 ) × exp{N (E − D(V 0 kW |P ) − HP,V (Y |X))}

≤ (N + 1)2|X | exp{−N δ/2},
or
EAm (X(M )) ≤ (N + 1)2|X |+3|X ||Y| exp{−N δ/2},
because of Lemma 1.1 (2.23) is valid for N large enough.

We shall use Lemma 2.6 to prove Theorem 2.5.
Proof. By Lemma 2.6 the existence of M vectors x(m) satisfying (2.20)

and (2.21) is guaranteed for any E, P , and δ.
Let us apply the decoding rule for decoder g using criterion of the
minimum divergence (see Section 2.6): each y is decoded to such m0
that for some V 0
N 0 0
y ∈ TP,V 0 (Y |x(m )) and D(V kW |P ) is minimal.
The decoder g can make an error if the message m was transmitted,

but there exists m0 6= m, such that for some V 0
\
N N 0
y ∈ TP,V (Y |x(m)) TP,V 0 (Y |x(m ))
and
D(V 0 kW |P ) ≤ D(V kW |P ). (2.25)

Denote
M
D(P ) = V, V 0 : (2.25) is valid .

We estimate the error probability e(m) in the following way:

 
[ \ [ 
N N N 0
e(m) ≤ W TP,V (Y |x(m)) TP,V 0 (Y |x(m )) | x(m) .
 0

D(P ) m6=m
Taking into account that the PD W N (y|x(m)) is constant for fixed

P, V (see (1.6)) the last expression can be upper bounded by

X \ [
N 0
N
TP,V 0 (Y |x(m )) W N (y|x(m)) .

TP,V (Y |x(m))

D(P ) m6=m0
Now from (2.21) using (1.6) we obtain that for maximal error proba-
bility for sufficiently large N ,
X
e(f, g, N, W ) ≤ exp {N HP,V (Y |X)}
D(P )
× exp −N (E − D(V 0 kW |P ))

× exp {−N (HP,V (Y |X) + D(V kW |P ))}

≤ exp{−N (E − )}.
The last inequality is valid because the number of all possible V, V 0
from D(P ) does not exceed (N + 1)2|X ||Y| .
2.5 Expurgated Bound for E-capacity

Now we shall formulate another lower bound called expurgated bound for
E-capacity, the proof will be sketched in Section 2.6, it repeats all steps
of analogous demonstration for reliability function E(R, W ), made by
Csiszár and Körner by the method of graph decomposition [50].
Consider the following function:
M e − E|+ ,
Rx (P, E, W ) = min |IP,V (X ∧ X)
e + EdB (X, X)
V
where dB (X, X)
e is the Bhattacharyya distance (2.10) and
M
Rx (E, W ) = max Rx (P, E, W ).
P
Theorem 2.7. For DMC W for any E > 0 the following bound holds
R(E, W ) ≥ max(Rr (E, W ), Rx (E, W )).
In the next theorem the region, where the upper and the lower
bounds coincide, is pointed out. Let

∂Rsp (P, E, W )
Ecr (P, W ) = min E : ≥ −1 .
∂E
Theorem 2.8. For DMC W and PD P , for E ∈ [0, Ecr (P, W )] we have
R(P, E, W ) = Rsp (P, E, W ) = Rr (P, E, W ),
and, particularly, for E = 0
Rsp (P, 0, W ) = Rr (P, 0, W ) = IP,W (X ∧ Y ).
2.6 Random Coding and Expurgated Bounds Derivation

by the Method of Graph Decomposition
Alternative methods for existence part of coding theorems demonstra-
tion are Shannon’s random coding method (see Section 2.4) and Wol-
fowitz’s method of maximal codes [222]. In [50] Csiszár and Körner
introduced a new original method, based on the lemma of Lovász on
graph decomposition.
We now present an account of scheme of analogous derivation of ran-
dom coding and expurgated bounds for E-capacity R(E, W ) of DMC,
which was first expounded in [116] and appeared in [107].
Theorems 2.7 and 2.8 formulated in Section 2.5 are consequences of
the following
Theorem 2.9. For DMC W : X → Y, any E > δ > 0 and type P ∈

PN (X ) for sufficiently large N codes (f, g) exist such that
exp{−N (E + δ)} ≤ e(f, g, N, W ) ≤ exp{−N (E − δ)} (2.26)

and
R(f, g, N ) ≥ max(Rr (P, E + δ, W ), Rx (P, E + δ, W )).
The proof of Theorem 2.9 consists of a succession of lemmas. The

first one is
Lemma 2.10. For any type P , for any r ∈ (0, |TPN (X)|) a set C exists,
such that C ⊂ TPN (X), |C| ≥ r and for any x
e ∈ C and matrix V : X → X ,
different from the identity matrix, the following inequality holds
\
N N
TP,V (X|e
x) C ≤ r TP,V x) exp{−N (HP (X) − δN )}, (2.27)
(X|e

where δN = N −1 [(|X |2 + |X |) log(N + 1) + 1].
For demonstration of code existence theorems various “good”

decoding rules may be considered. The definitions of those rules may
apply different real-valued functions α with domain X N × Y N . One
says that gα decoding is used if to each y from Y N at the output of
the channel the message m is accepted when codeword x(m) minimizes
α(x(m), y). One uses such functions α which depend only on the type
P of x and the conditional type V of y for given x. Such functions α
can be written in the form α(P, V ) and at respective decoding
gα : Y N → MN ,
the message m corresponds to the vector y, if

\
N
α(P, V ) = min α(P, Ve ), y ∈ TP,V (Y |x(m)) TP,NVe (Y |e
x(m)).
Ve
M
Here Ve = {Ve (y|e e ∈ X , y ∈ Y} is a matrix different from V but guar-
x), x
anteeing that
X X
P (x)V (y|x) = x)Ve (y|x), y ∈ Y,
P (e (2.28)
x∈X e∈X
x
or equivalently, P V = P Ve .
Previously the following two rules were used [50]: maximum-

likelihood decoding, when accepted codeword x(m) maximizes transi-
tion probability W N (x(m)|y), in this case according to (1.6)
α(P, V ) = D(V kW |P ) + HP,V (Y |X), (2.29)
and the second decoding rule, called minimum-entropy decoding,
according to which the codeword x(m) minimizing HP,V (Y |X) is
accepted, that is
α(P, V ) = HP,V (Y |X). (2.30)

In [108] and [116] another decoding rule was proposed by minimiza-
tion of
α(P, V ) = D(V kW |P ), (2.31)

which can be called minimum-divergence decoding.
Let Ve = {Ve (y|x, x
e), x ∈ X , x
e ∈ X , y ∈ Y}, be conditional distribu-
e e
tion of RV Y given values of RV X and RV X e such that
X
P (ex)V (x|ex)Ve (y|x, x
e) = P (x)V (y|x), (2.32)
e
e
x
X
P 0 (x)V (e
x|x)Ve (y|x, x
e) = P (e
x)Ve (y|e
x). (2.33)
e
x
Following the notation from [50] we write
Ve ≺α V, if α(P, Ve ) ≤ α(P, V ), and P Ve = P V.
Let us denote
n
M
Rα (P, E, W ) = min IP,V (X ∧ X)
e
e
V ,Ve :Ve ≺α V,D(V kW |P )≤E
o
+
+ |I e (Y ∧ X|X) + D(V kW |P ) − E| | , (2.34)
P,V ,Ve
e
where RV X, X, e Y have values, correspondingly, on X , X , Y such that

the following is valid:
both X and X e have distribution P and P V = P ,

Ve is the conditional distribution of RV Y given X and X
e e
satisfying (2.33), (2.34), and (2.28).
The main point of the theorem’s demonstration is
Proposition 2.11. For any DMC W , any type P ∈ PN (X ), any

0 > 0, for all sufficiently large N codes (f, g ) exist
E > 0, δN > 0, δN α
such that
0 0
exp{−N (E + δN /2)} ≤ e(f, gα , N, W ) ≤ exp{−N (E − δN /2)},
(2.35)
and
0
R(P, f, gα , N ) ≥ Rα (P, E + δN , W ).
Remark 2.2. The functions Rα (P, E, W ) and Rx (P, E, W ) depend on

E continuously.
Lemma 2.12. Let us introduce the following functions

M
Rα,r (P, E, W ) = min min |IP,Ve (Y ∧ X)
e
V :D(V kW |P )≤E Ve :Ve ≺α V
+ D(V kW |P ) − E|+ ,
n
M
Rα,x (P, E, W ) = min IP,V (X ∧ X)
e + I
e (Y ∧ X|X)
e
e
e
V ,V,Ve :Ve ≺α V
P,V ,V
+ o
+ D(V kW |P ) − E .
Then
Rα (P, E, W ) ≥ max[Rα,x (P, E, W ), Rα,r (P, E, W )].
Lemma 2.13. A point Eα∗ (P, W ) exists, such that
max[Rα,x (P, E, W ), Rα,r (P, E, W )]

Rα,r (P, E, W ), when E ≤ Eα∗ (P, W ),

=
Rα,x (P, E, W ), when E ≥ Eα∗ (P, W ).
Lemma 2.14. For each α-decoding (2.29), (2.30), and (2.31)
Rα,x (P, E, W ) ≤ Rx (P, E, W ),
moreover, for maximum likelihood decoding (2.29) the equality holds.
Lemma 2.15. For each α-decoding
Rα,r (P, E, W ) ≤ Rr (P, E, W ),
moreover, for
— maximum likelihood decoding,

— minimum entropy decoding,
— minimum divergence decoding,
the equality holds.
The proof of Theorems 2.9 and 2.7 constitutes the unification of

Lemmas 2.10–2.15 and Proposition 2.11.
2.7 Comparison of Bounds for E-capacity
Lemma 2.16. For given DMC W , for type P and numbers 0 ≤ E 0 ≤ E
Rr (P, E, W ) = 0min
0
|Rsp (P, E 0 , W ) + E 0 − E|+ .
E :E ≤E
Proof. Applying definitions (2.17) and (2.13) we see
Rr (P, E, W ) = min |IP,V (X ∧ Y ) + D(V kW |P ) − E|+

V :D(V kW |P )≤E
= min min |IP,V (X ∧ Y ) + E 0 − E|+

E 0 :E 0 ≤E V :D(V kW |P )=E 0
= min |Rsp (P, E 0 , W ) + E 0 − E|+ .

E 0 :E 0 ≤E
Lemma 2.17. Involving

∂Rsp (P, E, W )
Ecr = Ecr (P, W ) = min E : ≥ −1 .
∂E
we can write for all E > 0
(
Rsp (P, E, W ), if E ≤ Ecr ,
Rr (P, E, W ) =
|Rsp (P, Ecr , W ) + Ecr − E|+ , if E ≥ Ecr .
Proof. Since the function Rsp (P, E, W ) is convex in E, then for the
values of E less than Ecr (P, W ) the tangency of the tangent is less
than −1, and for E greater than Ecr (P, W ), it is equal or greater than
−1. In other words
Rsp (P, E, W ) − Rsp (P, E 0 , W )
< −1, when E 0 < E ≤ Ecr ,
E − E0
from where
Rsp (P, E, W ) + E < Rsp (P, E 0 , W ) + E 0 ,
and consequently
min Rsp (P, E, W ) + E 0 = Rsp (P, E, W ) + E.

E 0 :E 0 ≤E<Ecr (P,W )
We obtain from this equality and Lemma 2.16 the statement of the
lemma for the case E ≤ Ecr (P, W ). Now if Ecr (P, W ) ≤ E 0 < E then
Rsp (P, E, W ) − Rsp (P, E 0 , W )

≥ −1,
E − E0
or
Rsp (P, E, W ) + E ≥ Rsp (P, E 0 , W ) + E 0 ,
and consequently
min Rsp (P, E 0 , W ) + E 0 = Rsp (P, Ecr , W ) + Ecr .

E 0 :Ecr ≤E 0
Again using Lemma 2.16 and the latest equality we obtain that for the
case E ≥ Ecr
n
Rr (P, E, W ) = min 0 min 0 |Rsp (P, E 0 , W ) + E 0 − E|+ ,
E :Ecr ≤E <E
o
0 0 +
0
min
0
|R sp (P, E , W ) + E − E|
E :E ≤Ecr
= |Rsp (P, Ecr , W ) + Ecr − E|+ .
Remark 2.3. At the point E = 0 we have
Rsp (P, 0, W ) = Rr (P, 0, W ) = IP,W (X ∧ Y ),

Rsp (0, W ) = Rr (0, W ) = C(W ).
Remark 2.4. In the interval (0, Ecr (P, W )] the functions R(P, E, W )
and R(E, W ) are exactly determined by
R(P, E, W ) = Rsp (P, E, W ) = Rr (P, E, W ),
and
R(E, W ) = Rsp (E, W ) = Rr (E, W ).
So Theorem 2.8 is proved.

3
Multiuser Channels
3.1 Two-Way Channels

The two-way channel (TWC) was first investigated by Shannon [196].
The channel has two terminals and the transmission in one direction
interferes with the transmission in the opposite direction. The sources
of two terminals are independent. In the general model of TWC the
encoding at each terminal depends on both the message to be transmit-
ted and the sequence of symbols received at that terminal. Similarly the
decoding at each terminal depends on the sequence of symbols received
and sent at that terminal.
Here we shall consider the restricted version of TWC, where the
transmitted sequence from each terminal depends only on the mes-
sage but does not depend on the received sequence at that terminal
(Figure 3.1).
A discrete restricted two-way channel (RTWC) with two terminals
is defined by a matrix of transition probabilities
WT = {W (y1 , y2 |x1 , x2 ), x1 ∈ X1 , x2 ∈ X2 , y1 ∈ Y1 , y2 ∈ Y2 },
where X1 , X2 are the finite input and Y1 , Y2 are the finite output
alphabets of the channel. The channel is supposed to be memoryless,
33
34 Multiuser Channels
Fig. 3.1 Restricted two-way channel.
it means that for N -length sequences

x1 = (x11 , x12 , . . . , x1N ) ∈ X1N , x2 = (x21 , x22 , . . . , x2N ) ∈ X2N ,
y1 = (y11 , y12 , . . . , y1N ) ∈ Y1N , y2 = (y21 , y22 , . . . , y2N ) ∈ Y2N ,
the transition probabilities are given in the following way:
N
Y
W N (y1 , y2 |x1 , x2 ) = W (y1n , y2n , |x1n , x2n ).
n=1
Denote
X
Wi (yi |x1 , x2 ) = W (y1 , y2 |x1 , x2 ), i = 1, 2.
y3−i
When the symbol x1 ∈ X1 is sent from the terminal 1, the corre-

sponding output symbol y1 ∈ Y1 arrives on the terminal 2. At the same
time the input symbol x2 is transmitted from the terminal 2 and the
symbol y2 arrives on the terminal 1.
Let M1 = {1, 2, . . . , M1 }, M2 = {1, 2, . . . , M2 } be the message sets of
corresponding sources. The code for RTWC is a collection of mappings
(f1 , f2 , g1 , g2 ), where f1 : M1 → X1N , f2 : M2 → X2N are the encodings
and g1 : M2 × Y1N → M1 , g2 : M1 × Y2N → M2 are the decodings.
The numbers
1
log Mi , i = 1, 2,
N
are called code rates. Denote
f (m1 , m2 ) = (f1 (m1 ), f2 (m2 )),
−1
gi (mi |m3−i ) = {yi : gi (m3−i , yi ) = mi }, i = 1, 2,
then
ei (m1 , m2 ) = WiN {YiN − gi−1 (mi |m3−i )|f (m1 , m2 )}, i = 1, 2, (3.1)
are the error probabilities of messages m1 and m2 .

We shall consider the average error probabilities of the code
1 X
ei (f, g, N, WT ) = ei (m1 , m2 ), i = 1, 2. (3.2)
M1 M2 m ,m
1 2
Let E = (E1 , E2 ), Ei > 0, i = 1, 2. Nonnegative numbers R1 , R2 are

said to be E-achievable rate pair for RTWC, if for any δi > 0, i = 1, 2,
there exists a code such that for sufficiently large N
1
log Mi ≥ Ri − δi , i = 1, 2,
N
and
ei (f, g, N, WT ) ≤ exp{−N Ei }, i = 1, 2. (3.3)
The region of all E-achievable rate pairs is called the E-capacity region
for average error probability and denoted by C(E, WT ).
The RTWC as well as the general TWC were first investigated
by Shannon [196], who obtained the capacity region C(WT ) of the
RTWC. The capacity region of the general TWC has not been found
up to now. Important results relative to various models of two-way
channels were obtained by authors of works [1, 2, 3, 61, 62, 144,
188, 228]. In particular, Dueck [61] demonstrated that the capacity
regions of TWC for average and maximal error probabilities do not
coincide.
Here the outer and inner bounds for C(E, WT ) are constructed.
Unfortunately, these bounds do not coincide and the liquidation of this
difference is a problem waiting for its solution.
Consider the following PDs:
P = {P (x1 , x2 ), x1 ∈ X1 , x2 ∈ X2 },
 
 X 
Pi = Pi (xi ) = P (x1 , x2 ), xi ∈ Xi , i = 1, 2,
 
x3−i
P ∗ = {P ∗ (x1 , x2 ) = P1 (x1 )P2 (x2 ), x1 ∈ X1 , x2 ∈ X2 },

P ◦ Vi = {P (x1 , x2 )Vi (yi |x1 , x2 ), x1 ∈ X1 , x2 ∈ X2 , yi ∈ Yi }, i = 1, 2,
where V1 , V2 are probability matrices.

Let us introduce notations for outer and inner bounds of the
E-capacity region. The following sphere packing region Rsp (E, WT ) in
the coordinate space R1 , R2 will serve as an outer bound of C(E, WT ).
Denote
n
Rsp (P, E, WT ) = (R1 , R2 ) :
0 ≤ R1 ≤ min IP,V1 (X1 ∧ Y1 |X2 ), (3.4)
V1 :D(V1 kW1 |P )≤E1
o
0 ≤ R2 ≤ min IP,V2 (X2 ∧ Y2 |X1 ) , (3.5)
V2 :D(V2 kW2 |P )≤E2
and
!
[
Rsp (E, WT ) = co Rsp (P, E, WT ) .
P
Hereafter we shall denote the convex hull of the set A as co(A).

The following random coding region Rr (E, WT ) will serve as an inner
bound for C(E, WT ). Let
n
Rr (P ∗ , E, WT ) = (R1 , R2 ) : R1 ≥ 0, R2 ≥ 0,
R1 ≤ min |IP,V1 (X1 ∧ X2 , Y1 )

P,V1 :D(P ◦V1 kP ∗ ◦W1 )≤E1
+ D(P ◦ V1 kP ∗ ◦ W1 ) − E1 |+ , (3.6)
R2 ≤ min |IP,V2 (X2 ∧ X1 , Y2 )
P,V2 :D(P ◦V2 kP ∗ ◦W2 )≤E2
o
+ D(P ◦ V2 kP ∗ ◦ W2 ) − E2 |+ , (3.7)
and
!
[
Rr (E, WT ) = co Rr (P ∗ , E, WT ) .
P∗
The following theorem is proved in [118].

Theorem 3.1. For all E = (E1 , E2 ), E1 > 0, E2 > 0,
Rr (E, WT ) ⊆ C(E, WT ) ⊆ Rsp (E, WT ).
Corollary 3.1. The limit of Rr (E, WT ), when E1 → 0, E2 → 0, coin-

cides with the capacity region of RTWC found by Shannon [196]:
!
[
Rr (WT ) = co Rr (P ∗ , WT ) ,
P∗
where
Rr (P ∗ , WT ) = {(R1 , R2 ) : 0 ≤ R1 ≤ IP ∗ ,W1 (X1 ∧ Y1 |X2 ),

0 ≤ R2 ≤ IP ∗ ,W2 (X2 ∧ Y2 |X1 )}.
The proof of the lower bound is based on a modification of the

packing Lemma 2.6.
Example 3.1. Assume that the channel is binary: X1 = X2 = Y1 =

Y2 = {0, 1}, and E1 = E2 .
Consider the channel, with the following matrices of transition prob-
abilities:

0.9 0.1 0.8 0.2
W1 : x2 = 0 : , x2 = 1 : ,
0.2 0.8 0.3 0.7

0.9 0.1 0.8 0.2
W2 : x1 = 0 : , x1 = 1 : .
0.3 0.7 0.1 0.9
Let us compare the inner and the outer bounds of the capacity
region, which are represented correspondingly in Figures 3.1(a) and
3.1(b).
The bounds of capacity region do not differ visually, but using the
special software, it was found, that the region of outer bound contains
points which do not belong to the region of inner bound. This means,
that really Rr (E, WT ) ⊂ Rsp (E, WT ). An average difference of rates R2
Fig. 3.2 (a) The outer bound of the capacity region. (b) The inner bound on the capacity
region.
for those points in the same interval of rates R1 approximately equals

0.000675025.
Now consider the inner and outer bounds of E-capacity region, that
are represented in Figures 3.3(a) and 3.3(b), respectively.
The critical value of reliability is Ecr ≈ 0.051, starting from which
the difference between inner and outer bounds grows faster, and for
E ≈ 0.191 the inner bound becomes 0.
3.2 Interference Channels

Shannon [196] considered also another version of the TWC, where
the transmission of information from one sender to its corresponding
receiver may interfere with the transmission of information from other
sender to its receiver, which was later called interference channel (IFC).
The general interference channel (GIFC) with two input and two out-
put terminals is depicted in Figure 3.4.
The GIFC differs from the TWC in two respects: the sender at each
terminal do not observe the outputs at that terminal and there is no
side information at the receivers.
Ahlswede [2, 3] obtained bounds for capacity region of GIFC.
The papers of Carleial [39, 40, 41] are also devoted to the investi-
gation of IFC, definite results are derived in a series of other works
Fig. 3.3 (a) The inner bound of E-capacity. (b) The outer bound of E-capacity.
Fig. 3.4 General interference channel.
[25, 44, 66, 90, 96, 169, 189, 203] but the capacity region is found only
in particular cases.
The main definitions are the same as for TWC except the error
probabilities of messages m1 and m2 , which are
ei (m1 , m2 ) = WiN {YiN − gi−1 (mi )|f (m1 , m2 )}, i = 1, 2,
because the decoding functions are gi : YiN → Mi , i = 1, 2.
In [129] the following theorem is proved.
Theorem 3.2. For all E = (E1 , E2 ), E1 > 0, E2 > 0 the following

region
n
Rr (P ∗ , E, WI ) = (R1 , R2 ) : R1 ≥ 0, R2 ≥ 0,
R1 ≤ min |IP,V1 (Y1 ∧ X1 )
P,V1 :D(P ◦V1 kP ∗ ◦W1 )≤E1
+ D(P ◦ V1 kP ∗ ◦ W1 ) − E1 |+ ,
R2 ≤ min|IP,V2 (Y2 ∧ X2 )
P,V2 :D(P ◦V2 kP ∗ ◦W2 )≤E2
o
+ D(P ◦ V2 kP ∗ ◦ W2 ) − E2 |+ .
[
Rr (E, WI ) = Rr (P ∗ , E, WI )
P∗
is the random coding bound of E-capacity region in the case of average
error probability for GIFC:
Rr (E, WI ) ⊆ C(E, WI ).
The proof is based on another version of the packing Lemma 2.6.
Lemma 3.3. For any E1 > 0, E2 > 0, δ ∈ (0, min(E1 , E2 )), type P ∗ on
X1 × X2 , if
1
0≤ log M1 ≤ min |IP,V1 (Y1 ∧ X1 )
N P,V1 :D(P ◦V1 kP ∗ ◦W1 )≤E1
+ D(P ◦ V1 kP ∗ ◦ W1 ) − E1 |+ − δ,
1
0≤ log M2 ≤ min |IP,V2 (Y2 ∧ X2 )
N P,V2 :D(P ◦V2 kP ∗ ◦W2 )≤E2
+ D(P ◦ V2 kP ∗ ◦ W2 ) − E2 |+ − δ,
then there exist M1 not necessarily distinct vectors x1 (m1 ) ∈ TP1 (X1 )
and M2 vectors x2 (m2 ) ∈ TP2 (X2 ) such that for all P : X1 → X2 , P 0 :
X1 → X2 , Vi : (X1 × X2 ) → Yi , Vi0 : (X1 × X2 ) → Yi , i = 1, 2, for suffi-
ciently large N the following inequalities take place

X \ [
0 0

TP,V (Y1 |f (m1 , m2 )) T P 0 ,V 0 (Y1 |f (m1 , m2 ))
1 1
f (m1 ,m2 )∈TP (X1 ,X2 ) m02 ; m01 6=m1
+
≤ exp{N HP,V1 (Y1 |X1 X2 )} exp{−N E1 − D(P 0 ◦ V10 kP ∗ ◦ W1 ) }

× M1 M2 exp{−N (D(P kP ∗ ) − δ)},

X \ [
0 0

TP,V (Y2 |f (m1 , m2 )) T P 0 ,V 0 (Y2 |f (m1 , m2 ))
2 2
f (m1 ,m2 )∈TP (X1 ,X2 ) m01 ; m02 6=m2
+
≤ exp{N HP,V2 (Y2 |X1 X2 )} exp{−N E2 − D(P 0 ◦ V20 kP ∗ ◦ W2 ) }

× M1 M2 exp{−N (D(P kP ∗ ) − δ)}.
Now consider the situation when the second encoder learns from
the first encoder the codeword, that will be sent in the present block.
This model called IFC with cribbing encoders is given in Figure 3.5.
In this case the second codeword depends on the choice of the first
one: f2 : M2 × X1N → X2N and the random coding bound of E-capacity
Fig. 3.5 Interference channel with cribbing encoders.
region for average error probability in Theorem 3.2 will take the fol-
lowing form:
n
Rr (P, E, W ) = (R1 , R2 ) : R1 ≥ 0, R2 ≥ 0,
R1 ≤ min |IP,V1 (Y1 ∧ X1 )

V1 :D(V1 kW1 |P )≤E1
+ D(V1 kW1 |P ) − E1 |+ ,
R2 ≤ min |IP,V2 (Y2 ∧ X2 ) − IP (X1 ∧ X2 )
V2 :D(V2 kW2 |P )≤E2
o
+ D(V2 kW2 |P ) −E2 |+ .
[
Rr (E, W ) = Rr (P, E, W ).
P
In this case the following lemma is applied:
Lemma 3.4. For all E1 > 0, E2 > 0, δ ∈ (0, min(E1 , E2 )), type P , if
1 +
log M1 ≤ min |IP,V1 (Y1 ∧ X1 ) + D(V1 kW1 |P ) − E1 | − δ,
N V1 :D(V1 kW1 |P )≤E1
1
log M2 ≤ min |IP,V2 (Y2 ∧ X2 ) − IP (X1 ∧ X2 )
N V2 :D(V2 kW2 |P )≤E2
+ D(V2 kW2 |P ) − E2 |+ − δ,
then there exist M1 not necessarily distinct vectors x1 (m1 ) ∈ TP1 (X1 )
and for each x1 (m1 ) ∈ TP1 (X1 ) there exist M2 not necessarily dis-
tinct vectors x2 (m2 , x1 (m1 )) ∈ TP2 (X2 |x1 (m1 )) such that for all Vi :
(X1 × X2 ) → Yi , Vi0 : (X1 × X2 ) → Yi , i = 1, 2, for sufficiently large N
the following inequalities take place

1 X N
TP,V
\ [ [
N 0 0

1
(Y1 |f (m1 , m2 )) TP,V10 (Y1 |f (m 1 , m2 ))
M1 M2 m ,m
1 2 m01 6=m1 m02
+
≤ exp{N HP,V1 (Y1 |X1 X2 )} exp{−N E1 − D(V10 kW1 |P ) },

1 X
TP,V (Y2 |f (m1 , m2 ))
\ [ [
0 0

2 TP,V20 (Y2 |f (m1 , m2 ))
M1 M2 m ,m
1 2 m02 6=m2 m01
+
≤ exp{N HP,V2 (Y2 |X1 X2 )} exp{−N E2 − D(V20 kW2 |P ) }.

Here we omit the proofs of Theorem 3.2, Lemmas 3.3 and 3.4 leaving
them as tasks to the reader.
3.3 Broadcast Channels

BC is a communication system in which there is one encoder and two
or more receivers.
We shall consider the BC WB with two receivers (see Figure 3.6).
There are three sources, two private and one common, a finite input
alphabet X and two finite output alphabets Y1 , Y2 and the probabil-
ity transition function W (y1 , y2 |x). Denote by W1 , W2 the marginal
transition probabilities:
X X
W1 (y1 |x) = W (y1 , y2 |x), W2 (y2 |x) = W (y1 , y2 |x).
y2 ∈Y2 y1 ∈Y1
Fig. 3.6 Broadcast channel.

The three sources create messages m0 , m1 , m2 from the correspond-

ing finite message sets M0 , M1 , M2 . The messages must be coded by
common encoder into one codeword and transmitted through channel
W to two receivers.
A code is a triple of mappings (f, g1 , g2 ), where f : M0 × M1 ×
M2 → X N is the encoding and g1 : Y1N → M0 × M1 , g2 : Y2N → M0 ×
M2 are the decodings. The code rates are N −1 log Mi , i = 0, 1, 2.
We denote by Dm i
0 ,mi
= gi−1 (m0 , mi ) the set of all yi which are
decoded, correspondingly, into m0 , mi , i = 1, 2. These sets are disjoint
and
[
i
Dm 0 ,mi
= YiN , i = 1, 2.
m0 ,mi
The decoding error probabilities at two outputs, when messages

m0 , m1 , m2 are transmitted, depend on the code and N -dimensional
matrices of transition probabilities WiN , i = 1, 2, and are defined by
the condition of absence of memory:
 
 [ 
ei (m0 , m1 , m2 ) = WiN Dm i
0 ,m0 |f (m0 , m1 , m2 ) ,
 0 i
0 0

(m0 ,mi )6=(m0 ,mi )
i = 1, 2.
For the code (f, g1 , g2 ) we consider maximal
ei (f, gi , N, WB ) = max ei (m0 , m1 , m2 ), i = 1, 2, (3.8)

m0 ,m1 ,m2
and average
1 X
ei (f, gi , N, WB ) = ei (m0 , m1 , m2 ), i = 1, 2, (3.9)
M0 M1 M2 m ,m ,m
0 1 2
error probabilities.
Let E = (E1 , E2 ), Ei > 0, i = 1, 2. Nonnegative real numbers
R0 , R1 , R2 are called E-achievable rates triple for GBC if for any δ > 0
and sufficiently large N there exists a code such that
1
log M0 Mi ≥ R0 + Ri − δ, i = 1, 2, (3.10)
N
and
ei (f, gi , N, WB ) ≤ exp{−N Ei }, i = 1, 2. (3.11)
E-capacity region C(E, WB ) for maximal error probability is

the set of all E-achievable rates triples. We consider also the
E-capacity region C(E, WB ) for the case of average error probabilities
in (3.11).
BC were first studied by Cover [45]. The capacity region of deter-
ministic BC was found by Pinsker [180] and Marton [172]. The capacity
region of the BC with one deterministic component was defined inde-
pendently by Marton [172] and Gelfand and Pinsker [85]. A series of
works are devoted to the solution of the problem for the asymmet-
ric BC [46, 153, 210], the capacity region was found in [153]. Despite
the fact that in many works several models of BC were considered
([20, 31, 68, 80, 83, 84, 88, 91, 154, 159, 183, 190] and others), the
capacity region of BC in the situation, when two private and one com-
mon messages must be transmitted, has not yet been found. Bounds
for the rate-reliability functions were not specified either. In [172] an
inner bound for the capacity region of BC was found. Outer bounds for
capacity region of BC without common message (when R0 = 0) were
found in [45] and in [190].
New bounds for capacity of BC are obtained by Liang and Kramer
[162]. In [218] Willems proved that the capacity regions of BC for
maximal and average error probabilities are the same. The works
[35, 47, 48, 51, 58, 152, 211] contain detailed surveys.
Here an inner bound is formulated for the E-capacity region of the
BC obtained in [130]. When E → 0 we obtain the random coding bound
of the capacity region C(WB ), which coincides with the bound obtained
in [172].
Let U0 , U1 , U2 be some finite sets. Consider RVs U0 , U1 , U2 , X, Y1 , Y2
with values, correspondingly, in U0 , U1 , U2 , X , Y1 , Y2 and with
joint PD
Q ◦ P ◦ Vi = {Q ◦ P ◦ Vi (u0 , u1 , u2 , x, yi ) = Q(u0 , u1 , u2 )
×P (x|u0 , u1 , u2 )Vi (yi |x), i = 1, 2},
where
Q = {Q(u0 , u1 , u2 ), ui ∈ Ui , i = 0, 1, 2},
P = {P (x|u0 , u1 , u2 ), x ∈ X , ui ∈ Ui , i = 0, 1, 2},
Vi = {Vi (yi |x), x ∈ X , yi ∈ Yi }, i = 1, 2.
So we have Markov chains (U0 , U1 , U2 ) → X → Yi , i = 1, 2.

Let us denote
Vi (Q, P, Ei ) = {Vi : D(Vi kWi |Q, P ) ≤ Ei }, i = 1, 2. (3.12)
To formulate the inner bound of E-capacity region we consider the

following inequalities for i = 1, 2:

+
0 ≤ R0 ≤ min min |IQ,P,Vi (Yi ∧ U0 ) + D(Vi kWi |Q, P ) − Ei | ,
i Vi ∈Vi (Q,P,Ei )
(3.13)
0 ≤ Ri ≤ min |IQ,P,Vi (Yi ∧ Ui |U0 ) + D(Vi kWi |Q, P ) − Ei |+ ,

Vi ∈Vi (Q,P,Ei )
(3.14)
0 ≤ R3−i ≤ min |IQ,P,V3−i (Y3−i ∧ U3−i |U0 )

V3−i ∈V3−i (Q,P,E3−i )
+ D(V3−i kW3−i |Q, P ) − E3−i |+ − IQ (U1 ∧ U2 |U0 ), (3.15)
and the regions
Rir (Q, P, E, WB )
= {(R0 , R1 , R2 ) : inequalities (3.13), (3.14), (3.15) take place for some
(U0 , U1 , U2 ) → X → Yi , i = 1, 2} ,
[
Rr (Q, P, E, WB ) = Rir (Q, P, E, WB ), (3.16)
i=1,2
[
Rr (E, WB ) = Rr (Q, P, E, WB ).
QP ∈QP(U0 ×U1 ×U2 ×X )
The following result is obtained in [130].

Theorem 3.5. For all E1 > 0, E2 > 0 the region Rr (E, WB ) is an inner
estimate for E-capacity region of BC:
Rr (E, WB ) ⊆ C(E, WB ) ⊆ C(E, WB ),
or in other words any rate triple (R0 , R1 , R2 ) ∈ Rr (E, WB ) is
E-achievable for the BC WB .
In [88] Hajek and Pursley conjectured that the achievable rates

region of BC will not change if we consider
|U0 | ≤ min(|X |, max(|Y1 |, |Y2 |))
and
|Ui | ≤ 1 + |U0 | · (min(|X |, |Yi |) − 1), i = 1, 2.
Corollary 3.2. When E1 → 0, E2 → 0, using time sharing arguments

we obtain from (3.16) the inner bound for the capacity region C(WB )
of BC, obtained by Marton [172]:
[
Rr (WB ) = Rr (Q, P, WB ),
QP ∈QP(U0 ×U1 ×U2 ×X )
where
Rr (Q, P, WB ) = {(R0 , R1 , R2 ) : for some (U0 , U1 , U2 ) → X → Yi },
i = 1, 2,
0 ≤ R0 ≤ min {IQ,P,W1 (Y1 ∧ U0 ); IQ,P,W2 (Y2 ∧ U0 )} ,
0 ≤ R0 + Ri ≤ IQ,P,Wi (Yi ∧ U0 Ui ), i = 1, 2,
R0 + R1 + R2 ≤ min {IQ,P,W1 (Y1 ∧ U0 ); IQ,P,W2 (Y2 ∧ U0 )}
+ IQ,P,W1 (Y1 ∧ U1 |U0 ) + IQ,P,W2 (Y2 ∧ U2 |U0 )
− IQ (U1 ∧ U2 |U0 ).
3.4 Multiple-Access Channels

The discrete memoryless multiple-access channel (MAC) with two
encoders and one decoder WM = {W : X1 × X2 → Y} is defined by a
matrix of transition probabilities
WM = {W (y|x1 , x2 ), x1 ∈ X1 , x2 ∈ X2 , y ∈ Y},
where X1 and X2 are the finite alphabets of the first and the second
inputs of the channel and Y is the finite output alphabet.
There exist various configurations of the MAC [2, 185, 199, 217].
The most general model of the MAC, the MAC with correlated encoder
inputs, was first studied by Slepian and Wolf [199] and then by Han
[89]. Three independent sources create messages to be transmitted by
two encoders (Figure 3.7). One of the sources is connected with both
encoders and each of the two others is connected with only one of the
encoders.
Let M0 = {1, 2, . . . , M0 }, M1 = {1, 2, . . . , M1 } and M2 = {1,
2, . . . , M2 } be the message sets of corresponding sources. The code
of length N for this model is a collection of mappings (f1 , f2 , g),
where f1 : M0 × M1 → X1N , f2 : M0 × M2 → X2N are the encod-
ings and g : Y N → M0 × M1 × M2 is the decoding. The numbers
N −1 log Mi , i = 0, 1, 2, are called code rates. Denote
f1 (m0 , m1 ) = x1 (m0 , m1 ), f2 (m0 , m2 ) = x2 (m0 , m2 ),

g −1 (m0 , m1 , m2 ) = {y : g(y) = (m0 , m1 , m2 )},
then
e(m0 , m1 , m2 ) = W N {Y N − g −1 (m0 , m1 , m2 )|f1 (m0 , m1 ), f2 (m0 , m2 )},

(3.17)
Fig. 3.7 MAC with correlated encoder inputs.

are the error probabilities of messages m0 , m1 , and m2 . We consider

the average error probabilities of the code
1 X
e(f1 , f2 , g, N, WM ) = e(m0 , m1 , m2 ). (3.18)
M0 M1 M2 m ,m ,m
0 1 2
Dueck [61] has shown that in general the maximal error capac-
ity region of MAC is smaller than the corresponding average error
capacity region. Determination of the maximal error capacity region
of the MAC in various communication situations is still an open
problem.
In [199] the achievable rates region of MAC with correlated sources
was found, the random coding bound for reliability function was con-
structed and in [101] a sphere packing bound was obtained.
Various bounds for error probability exponents and related results
have been derived also in [64, 65, 81, 166, 182], E-capacity region was
investigated in [119, 132].
To formulate the results let us introduce an auxiliary RV U with
values in a finite set U. Let RVs U, X1 , X2 , Y with values in alpha-
bets U, X1 , X2 , Y, respectively, form the following Markov chain: U →
(X1 , X2 ) → Y and be given by the following PDs:
P0 = {P0 (u), u ∈ U},

Pi∗ = {Pi∗ (xi |u), xi ∈ Xi }, i = 1, 2,
∗
P = {P0 (u)P1∗ (x1 |u)P2∗ (x2 |u), x1 ∈ X1 , x2 ∈ X2 },
P = {P (u, x1 , x2 ) = P0 (u)P (x1 , x2 |u), x1 ∈ X1 , x2 ∈ X2 },
∗
P
with x3−i P (x1 , x2 |u) = Pi (xi |u), i = 1, 2, and joint PD P ◦ V =
{P0 (u)P (x1 , x2 |u)V (y|x1 , x2 ), x1 ∈ X1 , x2 ∈ X2 , y ∈ Y}, where V =
{V (y|x1 , x2 ), x1 ∈ X1 , x2 ∈ X2 , y ∈ Y} is some conditional PD.
To define the random coding region Rr (E, WM ) obtained in [132]
we shall use the notion of conditional mutual information among three
RVs introduced by Liu and Hughes in [166]:
IP,V (X1 ∧ X2 ∧ Y |U ) = HP1∗ (X1 |U ) + HP2∗ (X2 |U ) + HP,V (Y |U )

− HP,V (Y, X1 , X2 |U )
= IP,V (X1 , X2 ∧ Y |U ) + IP (X1 ∧ X2 |U ).
Then the random coding region is

Rr (P ∗ , E, WM ) = {(R0 , R1 , R2 ) : R0 ≥ 0, R1 ≥ 0, R2 ≥ 0,
Ri ≤ min |IP,V (Xi ∧ X3−i , Y |U )
P,V :D(P ◦V kP ∗ ◦W )≤E
∗
+ D(P ◦ V kP ◦ W ) − E|+ , i = 1, 2, (3.19)
R1 + R2 ≤ min |IP,V (X1 ∧ X2 ∧ Y |U )
P,V :D(P ◦V kP ∗ ◦W )≤E
∗
+ D(P ◦ V kP ◦ W ) − E|+ , (3.20)
R0 + R1 + R2 ≤ min |IP,V (X1 , X2 ∧ Y )
P,V :D(P ◦V kP ∗ ◦W )≤E
+ IP (X1 ∧ X2 |U ) + D(P ◦ V kP ∗ ◦ W ) − E|+ },

(3.21)
and
( )
[
Rr (E, WM ) = co Rr (P ∗ , E, WM ) .
P∗
The sphere packing bound of [119] is the following

Rsp (P, E, WM ) = {(R0 , R1 , R2 ) :
0 ≤ Ri ≤ min IP,V (Xi ∧ Y |U, X3−i ), i = 1, 2, (3.22)
V :D(V kW |P )≤E
R1 + R2 ≤ min IP,V (X1 , X2 ∧ Y |U ), (3.23)
V :D(V kW |P )≤E
R0 + R1 + R2 ≤ min IP,V (X1 , X2 ∧ Y )}, (3.24)
V :D(V kW |P )≤E
and
( )
[
Rsp (E, WM ) = co Rsp (P, E, WM ) .
P
Theorem 3.6. For all E > 0, for MAC with correlated sources
Rr (E, WM ) ⊆ C(E, WM ) ⊆ Rsp (E, WM ).
Corollary 3.3. When E → 0, we obtain the inner and outer estimates

for the channel capacity region, the expressions of which are similar
Fig. 3.8 Regular MAC.
but differ by the PDs P and P ∗ . The inner bound coincides with the
capacity region:
Rr (P ∗ , WM ) = (R0 , R1 , R2 ) :

0 ≤ Ri ≤ IP ∗ ,W (Xi ∧ Y |X3−i , U ), i = 1, 2,
R1 + R2 ≤ IP ∗ ,W (X1 , X2 ∧ Y |U ),

R0 + R1 + R2 ≤ IP ∗ ,W (X1 , X2 ∧ Y ) ,
obtained in [199], where it was also proved that in this case it is enough
to consider |U| ≤ |Y| + 3.
The results for the following special cases can be obtained from the
general case.
Regular MAC. In the case with M0 = 1 (Figure 3.8) we have the
classical MAC studied by Ahlswede [2, 3] and van der Meulen [209].
Ahlswede obtained a simple characterization of the capacity region.
Theorem 3.7. For the regular MAC, when R0 = 0,

n
Rr (P ∗ , E, WM ) = (R1 , R2 ) : R1 ≥ 0, R2 ≥ 0, for i = 1, 2,
Ri ≤ min |IP,V (Xi ∧ X3−i , Y |U )
P,V :D(P ◦V kP ∗ ◦W )≤E
+ D(P ◦ V kP ∗ ◦ W ) − E|+ ,
R1 + R2 ≤ min |IP,V (X1 ∧ X2 ∧ Y |U )
P,V :D(P ◦V kP ∗ ◦W )≤E
o
+ D(P ◦ V kP ∗ ◦ W ) − E|+ ,
is the inner bound of the E-capacity region and
Rsp (P, E, WM )

= (R1 , R2 ) : 0 ≤ Ri ≤ min IP,V (Xi ∧ Y |X3−i ), i = 1, 2,
V :D(V kW |P )≤E

R1 + R2 ≤ min IP,V (X1 , X2 ∧ Y ) ,
V :D(V kW |P )≤E
is the outer bound.
In [166] it was proved that in this case it is enough to consider

|U| = 4.
Asymmetric MAC. The model, when M1 = 1 (Figure 3.9), is called

asymmetric MAC. It was considered by Haroutunian [101], and by van
der Meulen [212].
Here one of two messages has access only to one encoder, whereas
the other message has access to both encoders.
Theorem 3.8. For the asymmetric MAC, when R1 = 0,

n
Rr (P, E, WM ) = (R0 , R2 ) : 0 ≤ R2 ≤ min
V :D(V kW |P )≤E
× |IP,V (X2 ∧ Y |X1 ) + D(V kW |P ) − E|+ ,

R0 + R2 ≤ min
V :D(V kW |P )≤E
o
× |IP,V (X1 , X2 ∧ Y ) + D(V kW |P ) − E|+ ,
Fig. 3.9 Asymmetric MAC.

and
Rsp (P, E, WM )
n
= (R0 , R2 ) : 0 ≤ R2 ≤ min IP,V (X2 ∧ Y |X1 ),
V :D(V kW |P )≤E
o
R0 + R2 ≤ min IP,V (X1 , X2 ∧ Y ) .
V :D(V kW |P )≤E
In this case, when E → 0, outer and inner bounds are equal and
coincide with the capacity region of the asymmetric MAC [101, 212].
MAC with cribbing encoders. Willems [217] and Willems and
van der Meulen [219] investigated MAC with cribbing encoders in
various communication situations and established the corresponding
capacity regions. We shall consider only one of these configurations
(Figure 3.10), investigated by van der Meulen [211], when the first
encoder has an information about the codeword produced by the sec-
ond encoder.
Theorem 3.9. For the MAC with cribbing encoders the outer and
inner bounds are:
n
Rr (P, E, WM ) = (R1 , R2 ) : R1 ≥ 0, R2 ≥ 0,
Ri ≤ min |IP,V (Xi ∧ Y |X3−i )
V :D(V kW |P )≤E
+ D(V kW |P ) − E|+ , i = 1, 2,
R1 + R2 ≤ min
V :D(V kW |P )≤E
o
× |IP,V (X1 , X2 ∧ Y ) + D(V kW |P ) − E|+ ,
Fig. 3.10 MAC with cribbing encoders.

and
n
Rsp (P, E, WM ) = (R1 , R2 ) : R1 ≥ 0, R2 ≥ 0,
Ri ≤ min IP,V (Xi ∧ Y |X3−i ), i = 1, 2,

V :D(V kW |P )≤E
o
R1 + R2 ≤ min IP,V (X1 , X2 ∧ Y ) .
V :D(V kW |P )≤E
And for E → 0 they coincide with the capacity region of this

channel.
Channel with two inputs and two outputs. This channel

is defined by a matrix of transition probabilities WM =
{W (y1 , y2 |x1 , x2 ), x1 ∈ X1 , x2 ∈ X2 , y1 ∈ Y1 , y2 ∈ Y2 }.
The two input–two output MAC, first considered by Ahlswede
[3], may also be interpreted as a compound MAC with informed
decoder (see Figure 3.11). In [3] the capacity region of this chan-
nel was determined. In [169] the capacity regions are found for com-
pound MAC with common information and compound MAC with
conferencing.
Denote
X
WiN (yi |x1 , x2 ) = W (y1 , y2 |x1 , x2 ), i = 1, 2; j = 3 − i.
yj
We use the following PDs
P0 = {P0 (u), u ∈ U},

P = {P (u, x1, x2 ) = P0 (u)P (x1 , x2 |u), x1 ∈ X1 , x2 ∈ X2 },
Fig. 3.11 Channel with two inputs and two outputs.

X
Pi = Pi (xi |u) = P (x1 , x2 |u), xi ∈ Xi , i = 1, 2,
x3−i
P ∗ = {P0 (u)P ∗ (x1 , x2 |u) = P0 (u)P1 (x1 |u)P2 (x2 |u),

x1 ∈ X1 , x2 ∈ X2 },
P ◦ V = {P0 (u)P (x1 , x2 |u)V (y|x1 , x2 ), x1 ∈ X1 , x2 ∈ X2 , y ∈ Y},
where V1 , V2 are probability matrices.

Let us consider the random coding region Rr (E, WM )
n
Rr (P ∗ , E, WM ) = (R1 , R2 ) : R1 ≥ 0, R2 ≥ 0,
R1 ≤ min min
i=1,2 P,Vi :D(P ◦Vi kP ∗ ◦Wi )≤Ei
× |IP,Vi (X1 ∧ Yi |X2 , U ) + D(P ◦ Vi kP ∗ ◦ Wi ) − Ei |+ ,

R2 ≤ min min
i=1,2 P,Vi :D(P ◦Vi kP ∗ ◦Wi )≤Ei
× |IP,Vi (X2 ∧ Yi |X1 , U ) + D(P ◦ Vi kP ∗ ◦ Wi ) − Ei |+ ,

R1 + R2 ≤ min min ∗
i=1,2 P,Vi :D(P ◦Vi kP ◦Wi )≤Ei
o
× |IP,Vi (Yi ∧ X1 X2 |U ) + D(P ◦ Vi kP ∗ ◦ Wi ) − Ei |+ ,
and
[
Rr (E, WM ) = Rr (P ∗ , E, WM ).
P∗
The following theorem is proved in [135].
Theorem 3.10. For all E = (E1 , E2 ), E1 > 0, E2 > 0,
Rr (E, WM ) ⊆ C(E, WM ).
When E1 → 0, E2 → 0, this bound coincides with the capacity region

constructed in [3], where it was determined that |U| ≤ 6.
4
E -capacity of Varying Channels
4.1 Bounds on E-capacity for the Compound Channels

Let X , Y, S be finite sets and the transition probabilities of a DMC,
with input alphabet X and output alphabet Y, depend on a parameter
s with values in S. In other words, we have a set of DMC
Ws = {W (y|x, s), x ∈ X , y ∈ Y}, s ∈ S.
Values of the parameter s can be changed by different rules, depending
on which different models of channels are formed.
Varying channels can be considered in different situations, when the
state of the channel is known or unknown on the encoder and decoder.
The DMC is called compound channel WC (DCC), if the state s ∈ S
of the channel is invariable during transmission of one codeword of
length N , but can be changed arbitrarily for transmission of the next
codeword. This channel can be considered in four cases, when the cur-
rent state s of the channel is known or unknown at the encoder and at
the decoder.
The notions of rate, error probability, E-capacity are defined simi-
larly to those of DMC (see Section 2).
57
58 E-capacity of Varying Channels
DCC were first studied by Blackwell et al. [33] and Dobrushin [56].
The capacity was found by Wolfowitz [221], who has shown that the
knowledge of the state s at the decoder does not improve the asymptotic
characteristics of the channel. So it is enough to study the channel in
two cases.
As for DMC, the capacity of the compound channel for average and
maximal error probabilities are the same.
In the book by Csiszár and Körner [51] the random coding bound
and the sphere packing bound for reliability function of DCC are given.
We shall formulate the sphere packing, random coding, and expurgated
bounds of E-capacity of DCC.
Let us denote by C(E, WC ) the E-capacity of DCC in the case,
when the state s is unknown at the encoder and decoder. In the case,
when s is known at the encoder and decoder, the E-capacity will be
denoted by Ĉ(E, WC ).
Let us introduce the following functions:
M
Rsp (E, WC ) = max min min IP,V (X ∧ Y ),
P s∈S V :D(V kWs |P )≤E
M
R̂sp (E, WC ) = min max min IP,V (X ∧ Y ).
s∈S P V :D(V kWs |P )≤E
Consider also the functions

M
Rr (E, WC ) = max min min |IP,V (X ∧ Y ) + D(V kWs |P ) − E|+ ,
P s∈S V :D(V kWs |P )≤E
M
R̂r (E, WC ) = min max min |IP,V (X ∧ Y ) + D(V kWs |P ) − E|+ ,
s∈S P V :D(V kWs |P )≤E
and
n o
M
Rx (E, WC ) = max min IP,V (X ∧ X) + |EdW,s (X, X) − E|+ ,
min
P s∈S PX =PX =P
n o
M
R̂x (E, WC ) = min max min IP,V (X ∧ X) + |EdW,s (X, X) − E|+ .
s∈S P PX =PX =P
Theorem 4.1. For any DCC, any E > 0 the following inequalities hold
max(Rr (E, WC ), Rx (E, WC )) ≤ C(E, WC ) ≤ Rsp (E, WC ),

max(R̂r (E, WC ), R̂x (E, WC )) ≤ Ĉ(E, WC ) ≤ R̂sp (E, WC ).
The proof of the lower bounds is a modification of the proof for

DMC by the method of graph decomposition expounded in Section 2.6
and in [107].
4.2 Channels with Random Parameter

The channel WQ with random parameter (CRP) is a family of discrete
memoryless channels Ws : X → Y, where s is the channel state, varying
independently in each moment of the channel action with the same
known PD Q(s) on S.
The considered channel is memoryless and stationary, that is for
an N -length input word x = (x1 , x2 , . . . , xN ) ∈ X N , the output word
y = (y1 , y2 , . . . , yN ) ∈ Y N and states sequence s = (s1 , s2 , . . . , sN ) ∈ S N ,
the transition probabilities are
N
Y N
Y
W N (y|x, s) = W (yn |xn , sn ), QN (s) = Q(sn ).
n=1 n=1
Let again M be the message set and M be its cardinality.
First we consider the situation with state sequence known to the
sender and unknown to the receiver.
The code for such channel is defined by encoding f : M × S N →
X and decoding g : Y N → M. The number R(f, g, N ) = (1/N ) log M
N
is called code rate. Denote by

M
e(m, s) = e(f, g, N, m, s) = W N (Y N − g −1 (m)|f (m, s), s) (4.1)
the probability of erroneous transmission of the message m for given
states sequence s. The maximal eQ and the average eQ error probabil-
ities of the code (f, g) for the channel WQ are, correspondingly,
M
X
eQ = e(f, g, N, WQ ) = max QN (s)e(m, s), (4.2)
m∈M
s∈S N
M 1
X X
eQ = e(f, g, N, WQ ) = QN (s)e(m, s). (4.3)
M N m∈M s∈S
E-capacity of the channel WQ for average error probability is

denoted by C(E, WQ ).
The CRP with additional information at the encoder was first con-
sidered by Shannon [193]. He studied the situation, when choosing the
input symbol xn , n = 1, N one knows the states sm of the channel for

m ≤ n, however the states sm for m > n are unknown. This model in
literature is referred as causal side information.
Gelfand and Pinsker [86] determined the capacity of CRP for aver-
age error probability in the case, when for the choice of the codeword
x one needs to know the whole sequence s, in other words in the case
of non-causal side information. As for the DMC for the CRP also the
capacities for average and maximal error probabilities are equal.
Let U, S, X, Y be RVs with values in finite sets U, S, X , Y, respec-
tively, with PDs Q(s), P (u, x|s), and V (y|x, s), s ∈ S, u ∈ U, x ∈ X ,
y ∈ Y. Let us denote
M
Rsp (Q, P, V ) = IQ,P,V (Y ∧ X|S), (4.4)
M
Rr (Q, P, V ) = IQ,P,V (Y ∧ U ) − IQ,P (S ∧ U )
= IQ,P,V (Y ∧ X|S) − IQ,P,V (S ∧ U |Y )
− IQ,P,V (Y ∧ X|U, S). (4.5)
Consider the following functions:
M
Rsp (E, WQ ) = min max min Rsp (Q0 , P, V ),
Q0 ∈Q(S) P V :D(Q0 ◦P ◦V kQ◦P ◦W )≤E
M
Rr (E, WQ ) = min max min |Rr (Q0 , P, V )
Q0 ∈Q(S) P V :D(Q0 ◦P ◦V kQ◦P ◦W )≤E
+ D(Q0 ◦ P ◦ V kQ ◦ P ◦ W ) − E|+ .
The following theorem was proved by M. Haroutunian in [128, 131].
Theorem 4.2. For all E > 0, for CRP with states sequence known to
the sender the following inequalities are valid
Rr (E, WQ ) ≤ C(E, WQ ) ≤ C(E, WQ ) ≤ Rsp (E, WQ ).
Note that when E → 0 we obtain the upper and the lower bounds
for capacity of the channel WQ :
M
Rsp (WQ ) = max Rsp (Q, P, V ),
P
M
Rr (WQ ) = max Rr (Q, P, V ),
P
where Rr (WQ ) coincides with the capacity C(WQ ) of the CRP, obtained
by Gelfand and Pinsker [85]. They also proved that it is enough to
consider RV U with |U| ≤ |X | + |S|.
For the model with states sequence known at the encoder and
decoder Rsp (E, WQ ) is the same, but upper and lower bounds coincide
for small E, because
M
Rr (E, WQ ) = max min |IQ0 ,P,V (Y ∧ X|S)
P Q0 ,V :D(Q0 ◦P ◦V kQ◦P ◦W )≤E
+ D(Q0 ◦ P ◦ V kQ ◦ P ◦ W ) − E|+ ,
with P = {P (x|s), x ∈ X , s ∈ S}.

When E → 0 the limits of bounds coincide and we obtain the capac-
ity, which is the same for maximal and average error probabilities
C(WQ ) = C(WQ ) = max IQ,P,W (Y ∧ X|S).

P
The results for this and next two cases are published in [117].
If the state sequence is unknown at the encoder and decoder let us
take W ∗ (y|x) = s∈S Q(s)W (y|x, s). Then the bounds will take the
P
following form:
M
Rsp (E, WQ ) = max min IP,V (Y ∧ X),
P V :D(V kW ∗ |P )≤E
M
Rr (E, WQ ) = max min |IP,V (Y ∧ X) + D(V kW ∗ |P ) − E|+ ,
P V :D(V kW ∗ |P )≤E
with P = {P (x), x ∈ X } and for the capacity we obtain
C(WQ ) = C(WQ ) = max IP,W ∗ (Y ∧ X).

P
In the case when the state sequence is known at the decoder and
unknown at the encoder the following bounds are valid:
M
Rsp (E, WQ ) = max min IQ0 ,P,V (Y, S ∧ X),
P Q0 ,V :D(Q0 ◦V kQ◦W |P )≤E
M
Rr (E, WQ ) = max min |IQ0 ,P,V (Y, S ∧ X)
P Q0 ,V :D(Q0 ◦V kQ◦W |P )≤E
+ D(Q0 ◦ V kQ ◦ W |P ) − E|+ ,
with P = {P (x), x ∈ X } and
C(WQ ) = C(WQ ) = max IQ,P,W (Y ∧ X|S).

P
Haroutunian in [128, 131] also studied the generalized CRP. Let

P0 ⊂ P(S) be a subset of the set of all PDs on S. Consider the CRP,
where the PD Q of the states is invariable during the transmission of
length N , but can be changed arbitrarily within P 0 during the next
transmission. This channel was first considered by Ahlswede [8], who
found the capacity, which is the same for average and maximal error
probabilities. The following theorem is stated.
Theorem 4.3. For GCRP for any E > 0

0 0
RrP (E, W ) ≤ CP 0 (E, W ) ≤ CP 0 (E, W ) ≤ Rsp
P
(E, W ),
where
P 0 M
Rsp (E, W ) = inf 0 Rsp (E, WQ ),
Q∈P
0 M
RrP (E, W ) = inf Rr (E, WQ ).
Q∈P 0
4.3 Information Hiding Systems

We explore the model of information hiding system with the single
message in Figure 4.1.
The message (watermark, fingerprint, etc.) needs to be embedded in
the host data set (which can be the blocks from the audio, image, and
video data) and to be reliably transmitted to a receiver via unknown
Fig. 4.1 The model of information hiding system.

channel, called the attack channel as it can be subject to random attacks

of attacker by a DMC. Side information, which can be cryptographic
keys, properties of the host data, features of audio, image or video data
or locations of watermarks, is available both to encoder and decoder.
The decoder does not know the DMC chosen by the attacker. The
encoding and decoding functions are known to the attacker, but not
the side information.
The information hider introduces certain distortion in the host data
set for the data embedding. The attacker trying to change or remove
this hidden information, introduces some other distortion.
Several authors introduced and studied various models of data-
hiding systems. Results on capacity and error exponents problems have
been obtained by Moulin and O’Sullivan [175], Merhav [174], Somekh-
Baruch and Merhav [200, 201], Moulin and Wang [176].
Let the mappings d1 : S × X → [0, ∞), d2 : X × Y → [0, ∞), be
single-letter symmetric distortion functions.
The information hiding N -length code is a pair of mappings (f, g)
subject to distortion ∆1 , where
f : M × S N × KN → X N
is the encoder, mapping host data block s, a message m and side infor-
mation k to a sequence x = f (s, m, k), which satisfies the following
distortion constraint:
dN
1 (s, f (s, m, k)) ≤ ∆1 ,
and
g : Y N × KN → M
is the decoding.
An attack channel, subject to distortion ∆2 , satisfies the following
condition:
X X
dN N
2 (x, y)A(y|x)p (x) ≤ ∆2 .
x∈X N y∈Y N
The N -length memoryless expression for the attack channel A is

N
Y
A(y|x) = A(yn |xn ).
n=1
The nonnegative number R(f, g, N ) = N −1 log |M| is called the infor-

mation hiding code rate.
We use an auxiliary RV U , taking values in the finite set U and
forming the Markov chain (K, S, U ) → X → Y .
A memoryless covert channel P , subject to a distortion level ∆1 ,
is a PD P = {P (u, x|s, k), u ∈ U, x ∈ X , s ∈ S, k ∈ K} such that for
any Q
X
Q(s, k)P (u, x|s, k)d1 (s, x) ≤ ∆1 . (4.6)
u,x,s,k
Denote by P(Q, ∆1 ) the set of all covert channels, subject to dis-

tortion level ∆1 . The N -length memoryless expression for the covert
channel P is
N
Y
P (u, x|s, k) = P (un , xn |sn , kn ).
n=1
A memoryless attack channel A, subject to distortion level ∆2 ,

under the condition of covert channel P ∈ P(Q, ∆1 ), is defined by a
PD A = {A(y|x), y ∈ Y, x ∈ X } such that
X
d2 (x, y)A(y|x)P (u, x|s, k)Q(s, k) ≤ ∆2 .
u,x,y,s,k
The set of all attack channels are denoted by A(Q, P, ∆2 ) under the
condition of covert channel P ∈ P(Q, ∆1 ) and subject to distortion level
∆2 . The sets P(Q, ∆1 ) and A(Q, P, ∆2 ) are defined by linear inequality
constraints and hence are convex.
The error probability of the message m averaged over all (s, k) ∈
S × KN equals to
N
e(f, g, N, m, Q, A)
M
X
= Q(s, k)A{Y N − g −1 (m|k)|f (m, s, k)}.
(s,k)∈ S N ×KN
We consider the maximal e(f, g, N, Q, P, ∆2 ) and the average

e(f, g, N, Q, P, ∆2 ) error probabilities of the code, maximal over all
attack channels from A(Q, P, ∆2 ).
The following lower bound of information hiding E-capacity for
maximal and average error probabilities is constructed by M. Haroutu-
nian and Tonoyan in [137].
Rr (Q∗ , E, ∆1 , ∆2 ) = min max min min

Q P ∈P(Q,∆1 ) A∈A(Q,P,∆2 ) V :D(Q◦P ◦V kQ∗ ◦P ◦A)≤E
× |IQ,P,V (Y ∧ U |K) − IQ,P (S ∧ U |K)

+ D(Q ◦ P ◦ V kQ∗ ◦ P ◦ A) − E|+ . (4.7)
As E → 0 we obtain the lower bound of information hiding capacity:
Rr (Q∗ , ∆1 , ∆2 ) = max min {IQ∗ ,P,A (Y ∧ U |K)

P ∈P(Q∗ ,∆1 ) A∈A(Q∗ ,P,∆2 )
− IQ∗ ,P (S ∧ U |K)} (4.8)
which coincides with the information hiding capacity, obtained by

Moulin and O’Sullivan [175].
In Figure 4.2 the random coding bound of E-capacity (4.7) is illus-
trated for three particular cases. In the first case ∆1 = 0.4, and ∆2 = 0.7
(curve “1”), in the second case ∆1 = 0.7, ∆2 = 0.7 (“2”) and in the third
case ∆1 = 0.4 and ∆2 = 0.8 (“3”).
In Figures 4.3(a), 4.3(b), and 4.3(c) the capacity function (4.8) is
depicted.
Fig. 4.2 The random coding bound of E-capacity.

Fig. 4.3 The capacity function.
The surface in Figure 4.3(a) illustrates the dependence of informa-

tion hiding rate from the distortion levels. Curves in Figures 4.3(b) and
4.3(c) are intersections of the rate surface with the planes parallel to
the ∆1 and ∆2 axes correspondingly.
Recent results for the E-capacity of information system with mul-
tiple messages and for the reversible information hiding system are
presented in [136, 138, 139], and [204].
4.4 Multiple-Access Channels with Random Parameter

Let X1 , X2 , Y, S be finite sets. The transition probabilities of a discrete
memoryless multiple-access channel with two encoders and one decoder
depend on a parameter s with values in S. In other words, we have a
set of conditional probabilities
Ws = {W (y|x1 , x2 , s), x1 ∈ X1 , x2 ∈ X2 , y ∈ Y}, s ∈ S.
The multiple-access channel WR with random parameter (MACRP)

is a family of discrete memoryless multiple-access channels Ws : X1 ×
X2 → Y, where s is the channel state, varying independently in each
moment with the same PD Q(s) on S.
Time varying MAC was explored in [13, 14, 55, 149, 181, 202]. In
[133, 134] M. Haroutunian studied the MAC with random parameter
in various situations, when the whole state sequence s is known or
unknown at the encoders and at the decoder.
Denote by e(m1 , m2 , s) the probability of erroneous transmission of
the messages m1 ∈ M1 , m2 ∈ M2 for given s. We study the average
error probability of the code:
1 X X
e(N, WR ) = QN (s)e(m1 , m2 , s).
M1 M2 m ,m N
1 2 s∈S
When s is known at the encoders and decoder the code of length N

is a collection of mappings (f1 , f2 , g), where f1 : M1 × S N → X1N and
f2 : M2 × S N → X2N are encodings and g : Y N × S N → M1 × M2 is
decoding. Denote
f1 (m1 , s) = x1 (m1 , s), f2 (m2 , s) = x2 (m2 , s),

g −1 (m1 , m2 , s) = {y : g(y) = (m1 , m2 , s)},
then
e(m1 , m2 , s) = W N {Y N − g −1 (m1 , m2 , s)|f1 (m1 , s), f2 (m2 , s), s}
is the probability of the erroneous transmission of messages m1 and m2 .

Let RVs X1 , X2 , Y, S take values in alphabets X1 , X2 , Y, S, respectively,
with the following PDs:
Q = {Q(s), s ∈ S},
Pi∗ = {Pi∗ (xi |s), xi ∈ Xi }, i = 1, 2,
∗
P = {P1∗ (x1 |s)P2∗ (x2 |s), x1 ∈ X1 , x2 ∈ X2 },
P = {P (x1 , x2 |s), x1 ∈ X1 , x2 ∈ X2 },
with
X
P (x1 , x2 |s) = Pi∗ (xi |s), i = 1, 2,
x3−i
and joint PD
Q ◦ P ◦ V = {Q(s)P (x1 , x2 |s)V (y|x1 , x2 , s),

s ∈ S, x1 ∈ X1 , x2 ∈ X2 , y ∈ Y},
where
V = {V (y|x1 , x2 , s), s ∈ S, x1 ∈ X1 , x2 ∈ X2 , y ∈ Y}
is some conditional PD.

The following region is called the random coding bound
M
[
Rr (E, WR ) = Rr (P ∗ , E, WR ),
P∗
with
n
M
Rr (P ∗ , E, WR ) = (R1 , R2 ) :

0 ≤ R1 ≤ min IQ0 ,P,V (X1 ∧ X2 , Y |S)
Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
+
+ D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E ,

0 ≤ R2 ≤ min IQ0 ,P,V (X2 ∧ X1 , Y |S)
Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
+
+ D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E ,
R1 + R2 ≤ min |IQ0 ,P,V (X1 , X2 ∧ Y |S)
Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
o
+ IQ0 ,P (X1 ∧ X2 |S) + D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E|+ .
The next region is called the sphere packing bound

M
[
Rsp (E, WR ) = Rsp (P, E, WR ),
P
where
M
Rsp (P, E, WR ) = {(R1 , R2 ) :
0 ≤ R1 ≤ min IQ0 ,P,V (X1 ∧ Y |X2 , S),
Q0 ,V :D(Q0 ◦P ◦V kQ◦P ◦W )≤E
0 ≤ R2 ≤ min IQ0 ,P,V (X2 ∧ Y |X1 , S),

Q0 ,V :D(Q0 ◦P ◦V kQ◦P ◦W )≤E
R1 + R2 ≤ min IQ0 ,P,V (X1 , X2 ∧ Y |S)}.

Q0 ,V : D(Q0 ◦P ◦V kQ◦P ◦W )≤E
The following theorem is proved by M. Haroutunian in [133].
Theorem 4.4. For all E > 0, for MAC with random parameter the
following inclusions are valid
Rr (E, WR ) ⊆ C(E, WR ) ⊆ Rsp (E, WR ).
Corollary 4.1. When E → 0, we obtain the inner and outer estimates

for the channel capacity region, the expressions of which are similar but
differ in the PDs P and P ∗ . The inner bound is
n
M
Rr (P ∗ , WR ) = (R1 , R2 ) :
0 ≤ Ri ≤ IQ,P ∗ ,W (Xi ∧ Y |X3−i , S), i = 1, 2,
o
R1 + R2 ≤ IQ,P ∗ ,W (X1 , X2 ∧ Y |S) .
For this model when the states are unknown at the encoders and
the decoder the mappings (f1 , f2 , g) are f1 : M1 → X1N , f2 : M2 → X2N
and g : Y N → M1 × M2 . Then
f1 (m1 ) = x1 (m1 ), f2 (m2 ) = x2 (m2 ),

−1
g (m1 , m2 ) = {y : g(y) = (m1 , m2 )},
and the probability of the erroneous transmission of messages m1 and

m2 is
e(m1 , m2 , s) = W N {Y N − g −1 (m1 , m2 )|f1 (m1 ), f2 (m2 ), s}.

Consider the distributions

Q = {Q(s), s ∈ S},
Pi∗ = {Pi∗ (xi ), xi ∈ Xi }, i = 1, 2,
P ∗ = {P1∗ (x1 )P2∗ (x2 ), x1 ∈ X1 , x2 ∈ X2 },
P = {P (x1 , x2 ), x1 ∈ X1 , x2 ∈ X2 },
V = {V (y|x1 , x2 ), x1 ∈ X1 , x2 ∈ X2 , y ∈ Y},
and
X
W ∗ (y|x1 , x2 ) = Q(s)W (y|x1 , x2 , s).
s∈S
In this case the bounds in Theorem 4.4 take the following form:
n
M
Rr (P ∗ , E, WR ) = (R1 , R2 ) : R1 ≥ 0, R2 ≥ 0,
+
R1 ≤ min |IP,V (X1 ∧ X2 , Y ) + D(P ◦ V kP ∗ ◦ W ∗ ) − E| ,
P,V :D(P ◦V kP ∗ ◦W ∗ )≤E
+
R2 ≤ min |IP,V (X2 ∧ X1 , Y ) + D(P ◦ V kP ∗ ◦ W ∗ ) − E| ,
P,V :D(P ◦V kP ∗ ◦W ∗ )≤E
R 1 + R2 ≤ min |IP,V (X1 , X2 ∧ Y ) + IP (X1 ∧ X2 )

P,V :D(P ◦V kP ∗ ◦W ∗ )≤E
o
+ D(P ◦ V kP ∗ ◦ W ∗ ) − E|+ ,
and
n
M
Rsp (P, E, WR ) = (R1 , R2 ) :
0 ≤ R1 ≤ min IP,V (X1 ∧ Y |X2 ),
V :D(V kW ∗ |P )≤E
0 ≤ R2 ≤ min IP,V (X2 ∧ Y |X1 ),

V :D(V kW ∗ |P )≤E
o
R1 + R2 ≤ min IP,V (X1 , X2 ∧ Y ) .
V :D(V kW ∗ |P )≤E
When E → 0, we obtain the inner and the outer estimates for the
channel capacity region, the expressions of which as in the previous
case are similar but differ by the PDs P and P ∗ . The inner bound is:
M
Rr (P ∗ , WR ) = {(R1 , R2 ) :
0 ≤ Ri ≤ IP ∗ ,W ∗ (Xi ∧ Y |X3−i ), i = 1, 2,
R1 + R2 ≤ IP ∗ ,W ∗ (X1 , X2 ∧ Y )}.
When the state is known on the decoder and unknown on the

encoders the code (f1 , f2 , g) is a collection of mappings f1 : M1 →
X1N , f2 : M2 → X2N and g : Y N × S → M1 × M2 . Then the proba-
bility of the erroneous transmission of messages m1 and m2 will be
e(m1 , m2 , s) = W N {Y N − g −1 (m1 , m2 , s)|f1 (m1 ), f2 (m2 ), s}.
For this model the following distributions participate in the formu-
lation of the bounds
Pi∗ = {Pi∗ (xi ), xi ∈ Xi }, i = 1, 2,

P ∗ = {P1∗ (x1 )P2∗ (x2 ), x1 ∈ X1 , x2 ∈ X2 },
P = {P (x1 , x2 ), x1 ∈ X1 , x2 ∈ X2 },
Q0 = {Q0 (s|x1 , x2 ), s ∈ S, x1 ∈ X1 , x2 ∈ X2 },
V = {V (y|x1 , x2 , s), s ∈ S, x1 ∈ X1 , x2 ∈ X2 , y ∈ Y}.
Then the bounds in Theorem 4.4 take the following form:

n
M
Rr (P ∗ , E, WR ) = (R1 , R2 ) :

0 ≤ R1 ≤ min IQ0 ,P,V (X1 ∧ X2 , S, Y )
Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
+
+ D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E ,

0 ≤ R2 ≤ min IQ0 ,P,V (X2 ∧ X1 , S, Y )
Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
+
+D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E ,
R1 + R2 ≤ min |IQ0 ,P,V (X1 , X2 ∧ S, Y )
Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
o
+ IP (X1 ∧ X2 ) + D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E|+ ,
and
n
M
Rsp (P, E, WR ) = (R1 , R2 ) :
0 ≤ R1 ≤ min IQ0 ,P,V (X1 ∧ Y, S|X2 ),
Q0 ,V :D(Q0 ◦V kQ◦W |P )≤E
0 ≤ R2 ≤ min IQ0 ,P,V (X2 ∧ Y, S|X1 ),
Q0 ,V :D(Q0 ◦V kQ◦W |P )≤E
o
R1 + R2 ≤ min IP,V (X1 , X2 ∧ Y, S) .
Q0 ,V :D(Q0 ◦V kQ◦W |P )≤E
When E → 0, we obtain the inner and outer estimates for the chan-
nel capacity region, the expressions of which as in the previous case are
similar but differ by the PDs P and P ∗ . The inner bound is
M
Rr (P ∗ , WR ) = {(R1 , R2 ) :
0 ≤ Ri ≤ IQ,P ∗ ,W (Xi ∧ Y |X3−i , S), i = 1, 2,
R1 + R2 ≤ IQ,P ∗ ,W (X1 , X2 ∧ Y |S)}.
The case when the states of the channel are known on the encoders
and unknown on the decoder is characterized by encodings f1 : M1 ×
S N → X1N and f2 : M2 × S N → X2N and decoding g : Y N → M1 ×
M2 . Then
e(m1 , m2 , s) = W N {Y N − g −1 (m1 , m2 )|f1 (m1 , s), f2 (m2 , s), s}
is the probability of the erroneous transmission of messages m1 and

m2 . Let the auxiliary RVs U1 , U2 take values correspondingly in some
finite sets U1 , U2 . Then with the following PDs
Q = {Q(s), s ∈ S},
Pi∗ = {Pi∗ (ui , xi |s), xi ∈ Xi , ui ∈ Ui }, i = 1, 2,
∗
P = {P1∗ (u1 , x1 |s)P2∗ (u2 , x2 |s), x1 ∈ X1 , x2 ∈ X2 },
P = {P (u1 , u2 , x1 , x2 |s), x1 ∈ X1 , x2 ∈ X2 },
and
V = {V (y|x1 , x2 , s), s ∈ S, x1 ∈ X1 , x2 ∈ X2 , y ∈ Y}
the random coding bound will be written in the following way:

n
Rr (P ∗ , E, WR ) = (R1 , R2 ) :

R1 ≤ min IQ0 ,P,V (U1 ∧ U2 , Y ) − IQ0 ,P (U1 ∧ S)
Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
+
+ D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E ,

R2 ≤ min IQ0 ,P,V (U2 ∧ U1 , Y ) − IQ0 ,P (U2 ∧ S)
Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
+
+ D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E ,
0 ≤ R1 + R2 ≤ min |IQ0 ,P,V (U1 , U2 ∧ Y )

Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
− IQ0 ,P (U1 , U2 ∧ S) + IQ0 ,P (U1 ∧ U2 )
o
+ D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E|+ .
When E → 0, we obtain the inner estimate for the channel capacity
region:
n
Rr (P ∗ , WR ) = (R1 , R2 ) :
0 ≤ Ri ≤ IQ,P ∗ ,W (Ui ∧ Y |U3−i ) − IQ,P ∗ (U1 ∧ S), i = 1, 2,
o
R1 + R2 ≤ IQ,P ,W (U1 , U2 ∧ Y ) − IQ,P (U1 , U2 ∧ S) .
∗ ∗
Another case is when the states are known on one of the encoders
and unknown on the other encoder and on the decoder. For distinctness
we shall assume, that the first encoder has information about the state
of the channel. Then the code will consist of the following mappings:
f1 : M1 × S N → X1N and f2 : M2 → X2N are encodings and g : Y N →
M1 × M2 is decoding. The probability of the erroneous transmission
of messages m1 and m2 is
e(m1 , m2 , s) = W N {Y N − g −1 (m1 , m2 )|f1 (m1 , s), f2 (m2 ), s}.
Let the auxiliary RV U takes values in some finite set U and
Q = {Q(s), s ∈ S},
P1∗ = {P1∗ (u, x1 |s), x1 ∈ X1 , u ∈ U},
P2∗ = {P2∗ (x2 ), x2 ∈ X2 },
P ∗ = {P1∗ (u, x1 |s)P2∗ (x2 ), x1 ∈ X1 , x2 ∈ X2 },
P = {P (u, x1 , x2 |s), x1 ∈ X1 , x2 ∈ X2 },
V = {V (y|x1 , x2 , s), s ∈ S, x1 ∈ X1 , x2 ∈ X2 , y ∈ Y}.
The random coding bound of the E-capacity region will be
n
Rr (P ∗ , E, WR ) = (R1 , R2 ) :

R1 ≤ min IQ0 ,P,V (U ∧ Y, X2 ) − IQ0 ,P ∗ (U ∧ S)
1
Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
+
+ D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E ,

R2 ≤ min IQ0 ,P,V (X2 ∧ Y, U ) − IQ0 ,P ∗ (U ∧ S)
1
Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
+
+ D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E ,
0 ≤ R1 + R2 ≤ min |IQ0 ,P,V (U, X2 ∧ Y )
Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
+ IQ0 ,P (U ∧ X2 ) − IQ0 ,P1∗ (U ∧ S)

o
+ D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E|+ .
Corollary 4.2. When E → 0, we obtain the inner estimate for the

channel capacity region:
Rr (P ∗ , WR ) = {(R1 , R2 ) :
0 ≤ R1 ≤ IQ,P ∗ ,W (U ∧ Y |X2 ) − IQ,P1∗ (U ∧ S),
0 ≤ R2 ≤ IQ,P ∗ ,W (X2 ∧ Y |U ) − IQ,P1∗ (U ∧ S),
R1 + R2 ≤ IQ,P ∗ ,W (U, X2 ∧ Y ) − IQ,P1∗ (U ∧ S)}.
4.5 Arbitrarily Varying Channels with State Sequence

Known to the Sender
The arbitrarily varying channel (AVC) is a discrete memoryless trans-
mission system, which depends on state s, that may change in an arbi-
trary manner within a finite set S.
A series of works are devoted to the investigation of AVC in various
situations [4, 15, 71, 146]. The capacity of the channel, when the state
sequence is known to the sender, was found by Ahlswede [8]. In [146]
the error exponent of AVS was studied.
For the channel W the maximal and average error probabilities are,
correspondingly,
M
e = e(f, g, N, W ) = max max e(m, s),
m∈M s∈S N
M 1 X
e = e(f, g, N, W ) = max e(m, s).
M s∈S N
m∈M
4.5 Arbitrarily Varying Channels 75
In [127] and [131] M. Haroutunian obtained the following bounds of

E-capacity (Q(S) is the set of all distributions on S):
M
Rsp (E, W ) = min max min Rsp (Q, P, V ),
Q∈Q(S) P V :D(V kW |Q,P )≤E
M
Rr (E, W ) = min max min |Rr (Q, P, V )
Q∈Q(S) P V :D(V kW |Q,P )≤E
+
+ D(V kW |Q, P ) − E| ,
where Rsp (Q, P, V ) and Rr (Q, P, V ) are defined in (4.4) and (4.5),
respectively.
Theorem 4.5. For all E > 0, for arbitrarily varying channel with state
sequence known to the sender the following inequalities are valid
Rr (E, W ) ≤ C(E, W ) ≤ C(E, W ) ≤ Rsp (E, W ).
Note that when E → 0 we obtain the upper and the lower bounds
for capacity:
M
Rsp (W ) = min Rsp (WQ ),
Q∈Q(S)
M
Rr (W ) = min Rr (WQ ).
Q∈Q(S)
Rr (W ) coincides with the capacity C(W ) of the arbitrarily varying

channel with state sequence known to the sender, found by Ahlswede
in [8].
5
Source Coding Rates Subject to Fidelity
and Reliability Criteria
5.1 Introductory Notes

In this section we expound the concept of the rate-reliability-distortion
function [143] for discrete memoryless sources (DMS).
Shannon rate-distortion function [194] shows the dependence of the
asymptotically minimal coding rate on a required average fidelity (dis-
tortion) threshold for source noiseless transmission. Another character-
istic in source coding subject to a distortion criterion can be considered,
namely, an exponential decrease in error probability with a desirable
exponent or reliability. The maximum error exponent as a function of
coding rate and distortion in the rate-distortion problem was specified
by Marton [171]. An alternative order dependence of the three param-
eters was examined by Haroutunian and Mekoush in [124]. They define
the rate-reliability-distortion function as the minimal rate at which the
messages of a source can be encoded and then reconstructed by the
receiver with an error probability that decreases exponentially with
the codeword length. Therefore, the achievability of the coding rate R
is considered as a function of a fixed distortion level ∆ ≥ 0 and an error
exponent E > 0. In a series of works [114, 115, 122, 123, 140, 170] and
77
78 Source Coding Rates Subject to Fidelity and Reliability Criteria
[213] the coauthors and their collaborators successively extended this

idea into the multiuser source coding problems. Recently this approach
was adopted by Tuncel and Rose [206].
Actually, the approach of [124] brings a technical ease. Solving a
general problem on rate-reliability-distortion one can readily convert
the results from that area to the rate-distortion one looking at the
extremal values of the reliability, e.g., E → 0, E → ∞. It is more useful
when we deal with a multiuser source coding problem. Particularly,
with the limit condition E → 0 the rate-reliability-distortion function
turns into the corresponding rate-distortion one.
In Sections 5.2–5.5, we elaborate on the concept of the rate-
reliability-distortion function and its properties. In case of exact (“zero-
distortion”) transmission the rate-reliability function is specified. The
convexity properties of those functions, important from information-
theoretical point of view, are in the focus of the discussions.
5.2 The Rate-Reliability-Distortion Function

The blockwise encoded messages of a DMS must be transmitted to the
receiver. The decoder based on the codeword has to recover the original
message in the framework of required distortion and reliability. The
model of such information transmission system is depicted in Figure 5.1
(cf. Figure 2.1).
The DMS X is defined as a sequence {Xi }∞i=1 of discrete independent
identically distributed (i.i.d.) RVs taking values in the finite set X ,
which is the alphabet of messages of the source. Let
P ∗ = {P ∗ (x), x ∈ X } (5.1)
be the generating PD of the source messages. Since we are interested

in the memoryless sources, the probability P ∗N (x) of the N -length
vector of N successive messages x = (x1 , x2 , . . . , xN ) ∈ X N is defined as
Fig. 5.1 Noiseless communication system.

product of the components probabilities

N
M
Y
∗N
P (x) = P ∗ (xn ). (5.2)
n=1
The finite set Xb, different in general from X , is the reproduction alpha-
bet at the receiver. Let
d : X × Xb → [0; ∞) (5.3)
be the given fidelity (distortion) criterion between source original and

reconstruction messages. The distortion measure for sequences x ∈ X N
b ∈ XbN is assumed to be the average of the components’ distortions
and x
N
M 1 X
d(x, x
b) = d(xn , x
bn ). (5.4)
N
n=1
We name the code, and denote (f, g) the family of two mappings:
a coding
f : X N → {1, 2, . . . , L(N )},
and a decoding
g : {1, 2, . . . , L(N )} → XbN ,
where L(N ) is the volume of the code.

The task of the system is to ensure restoration of the source mes-
sages at the receiver within a given distortion level ∆ and with a “small”
error probability. Our problem is to estimate the minimum of the code
volume sufficient for realizing this task.
For a given distortion level ∆ ≥ 0, we consider the following set:
M
A = {x ∈ X N : g(f (x)) = x b) ≤ ∆},
b, d(x, x
that is, the set of satisfactorily transmitted vectors of messages.

The error probability e(f, g, P ∗ , ∆, N ) of the code (f, g) for the
source PD P ∗ given ∆ and N will be
M
e(f, g, P ∗ , ∆, N ) = 1 − P ∗N (A).
Definition 5.1. A number R ≥ 0 is called ∆-achievable rate for given

PD P ∗ and ∆ ≥ 0 if for every ε > 0 and sufficiently large N there exists
a code (f, g) such that
1
log L(N ) ≤ R + ε, (5.5)
N
e(f, g, P ∗ , ∆, N ) ≤ ε. (5.6)
The minimum of these ∆-achievable rates defines the Shannon rate-

distortion function [194]. We denote it by R(∆, P ∗ ). The properties of
the rate-distortion function are well studied, and they can be found in
the books by Berger [26], Csiszár and Körner [51], Cover and Thomas
[48], and in the paper [9] by Ahlswede.
Quite logically, we could replace (5.6) by the average distortion
constraint
X
P ∗ (x)d(x, g(f (x))) ≤ ∆.
x∈X N
However, for the discrete memoryless source models this approach

brings us to the same performance limit, as it is noticed in Section 2.2
of the book [51].
We study the achievability of a coding rate involving, in addition
to the distortion criterion, the requirement that the error probability
exponentially decreases with given reliability E as N → ∞.
Definition 5.2. A number R ≥ 0 is called (E, ∆)-achievable rate for a

given PD P ∗ , E > 0 and ∆ ≥ 0 if for every ε > 0, δ > 0 and sufficiently
large N there exists a code (f, g) such that
1
log L(N ) ≤ R + ε (5.7)
N
and the error probability is exponentially small:
e(f, g, P ∗ , ∆, N ) ≤ exp{−N (E − δ)}. (5.8)

It is clear that if R is (E, ∆)-achievable then any real number larger

than R is also (E, ∆)-achievable.
Denote by R(E, ∆, P ∗ ) the minimum (E, ∆)-achievable rate for
given PD P ∗ and call it the rate-reliability-distortion function.
Before the detailed consideration of the rate-reliability approach in
source coding with fidelity criterion, we recall the result from investi-
gation of the inverse function. Reliability-rate-distortion dependence
was studied by Marton [171], who showed that if R > R(∆, P ∗ ) then
there exist N -length block codes (f, g) such that with N growing the
rate of the code approaches R:
N −1 log L(N ) → R
and the error probability e(f, g, P ∗ , ∆, N ) converges to zero exponen-

tially with the exponent
F (R, ∆, P ∗ ) = inf D(P k P ∗ ). (5.9)

P :R(∆,P )>R
In addition, this is the optimal dependence between R and E, due to

the following theorem.
Theorem 5.1. (Marton [171]) Let d be a distortion measure on X × Xb

and R < log |X |. Then there exist block codes (f, g) such that with
growing N (N ≥ N0 (|X |, δ))
(i) N −1 log L(N ) → R,
(ii) for every PD P ∗ on X , ∆ ≥ 0 and each δ > 0
N −1 log e(f, g, P ∗ , ∆, N ) ≤ −F (R, ∆, P ∗ ) + δ
and, furthermore, for every coding procedure satisfying (i)
lim N −1 log e(f, g, P ∗ , ∆, N ) ≥ −F (R, ∆, P ∗ ).

N →∞
The properties of the error exponent function F (R, ∆, P ∗ ) are dis-

cussed in [171] and [9]. Particularly, Marton [171] questioned the con-
tinuity of that function in R, meanwhile Ahlswede [9] gave the full
solution to this problem proving the discontinuity of F (R, ∆, P ∗ ) for
general distortion measures other than the Hamming distance.
We proceed with a discussion of the notion of (E, ∆)-achievability,

interpreting some properties of the function R(E, ∆, P ∗ ).
Lemma 5.2. (i) Every (E, ∆)-achievable rate is also ∆-achievable.

(ii) R(E, ∆, P ∗ ) is a non-decreasing function in E (for each fixed
∆ ≥ 0).
The proof is an evident application of Definitions 5.1 and 5.2.
Theorem 5.3. For any PD P ∗
lim R(E, ∆, P ∗ ) = R(∆, P ∗ ). (5.10)

E→0
Proof. It follows from Lemma 5.2 that for any fixed ∆ ≥ 0 and P ∗ for
every 0 < E2 ≤ E1 the following inequalities hold:
R(∆, P ∗ ) ≤ R(E2 , ∆, P ∗ ) ≤ R(E1 , ∆, P ∗ ).
For E → 0 the sequence R(E, ∆, P ∗ ) is a monotonically non-increasing

sequence lower bounded by R(∆, P ∗ ) and must therefore have a
limit. We have to prove that it is R(∆, P ∗ ). According to Theorem 5.1
every R > R(∆, P ∗ ) is an (E, ∆)-achievable rate for
E= inf D(P k P ∗ ).
P :R(∆,P )>R
If R − R(∆, P ∗ ) < δ then R(E, ∆, P ∗ ) − R(∆, P ∗ ) < δ, since

R(E, ∆, P ∗ ) ≤ R. Then the truthfulness of (5.10) follows.
In the next part of this section we introduce auxiliary notations

and demonstrate Theorem 5.4 in which the analytical form of the rate-
reliability-distortion function is formulated [124].
Let
M
Q = {Q(b
x | x), x ∈ X , x
b ∈ Xb},
be a conditional PD on Xb for given x.

Consider the following set of distributions:
α(E, P ∗ ) = {P : D(P k P ∗ ) ≤ E}. (5.11)

M
Let Q(P, ∆) be the set of all conditional PDs QP (b x | x) = QP , corre-
sponding to the PD P , for which the following condition on expectation
of distortion holds:
M
X
EP,QP d(X, X)
b = P (x)QP (bx | x)d(x, x
b) ≤ ∆. (5.12)
b
x, x
The solution of the main problem of this section is presented in the

following theorem.
Theorem 5.4. For every E > 0, ∆ ≥ 0,
R(E, ∆, P ∗ ) = max min IP,QP (X ∧ X).

b (5.13)
P ∈α(E,P ∗ ) QP ∈Q(P,∆)
The conformity with the rate-distortion theory is stated by
Corollary 5.1. When E → 0 we obtain the rate-distortion function

[26, 51, 194]:
R(∆, P ∗ ) = min I P ∗ ,QP ∗ (X ∧ X).

b (5.14)
QP ∗ :EP ∗ ,Q b
∗ d(X,X)≤∆
P
Then, another basic result in Shannon theory is obtained by taking

the receiver requirement on distortion equal to zero.
Corollary 5.2. When ∆ = 0 (provided that X ≡ Xb with d(x, x b) = 0

for x = x
b and d(x, x
b) = 1, otherwise, i.e., d is the Hamming measure),
then
R(0, P ∗ ) = HP ∗ (X), (5.15)
since now only the unitary matrix satisfies the condition on the expec-
tation under the minimum in (5.14). This is the solution to the lossless
source coding problem invented by Shannon in [191], which sets up the

crucial role of the entropy in information theory.
Note that an equivalent to the representation (5.13) in terms of the
rate-distortion function is the following one
R(E, ∆, P ∗ ) = max R(∆, P ). (5.16)

P ∈α(E,P ∗ )
Taking into account these two facts, namely (5.15) and (5.16), it
is trivial to conclude about the minimum asymptotic rate sufficient
for the source lossless (zero-distortion) transmission under the reli-
ability requirement. Let us denote by R(E, P ∗ ) the special case of
the rate-reliability-distortion function R(E, ∆, P ∗ ) for ∆ = 0 and the
generic PD P ∗ , and call it the rate-reliability function in source
coding.
Corollary 5.3. For every E > 0 and a fixed PD P ∗

M
R(E, P ∗ ) = R(E, 0, P ∗ ) = max HP (X). (5.17)
P ∈α(E,P ∗ )
Corollary 5.4. Inverse to R(E, P ∗ ) is the best error exponent function
E(R, P ∗ ) = min D(P k P ∗ ) (5.18)

P :HP (X)≥R
specialized from (5.9) for the zero-distortion argument, bearing in

mind the continuity of the function in R for Hamming distortion
measures.
Corollary 5.5. As E → ∞, Theorem 5.4 implies that the minimum

asymptotic rate R(∆) of coding of all vectors from X N , each being
reconstructed within the required distortion threshold ∆ is
R(∆) = max R(∆, P ). (5.19)

P ∈P(X )
This is the so-called “zero-error” rate-distortion function [49, 51].

It is worth to mention, that it is the same for any P ∗ .
5.3 Proofs, Covering Lemma

To prove Theorem 5.4 we apply the method of types (see Section 1.4).
The direct part of the proof we perform using the following random
coding lemma about covering of types of vectors.
Lemma 5.5. Let for ε > 0

J(P, Q) = exp{N (IP,Q (X ∧ X)
b + ε)}. (5.20)
Then for every type P and conditional type Q there exists a collection
of vectors
N
{b
xj ∈ TP,Q (X),
b j = 1, J(P, Q)},
such that the set

N
{TP,Q (X | x
bj ), j = 1, J(P, Q)},
covers TPN (X) for N large enough, that is
J(P,Q)
[
TPN (X) ⊂ N
TP,Q (X | x
bj ).
j=1
Proof. Using the method of random selection, similarly as in the proof

of Lemma 2.4.1 from [51], we show the existence of a covering for
TPN (X). Let {Zj , j = 1, J(P, Q)} be a collection of i.i.d. RVs with values
on TPN (X).
b Denote by ψ(x) the characteristic function of the comple-
SJ(P,Q) N
ment of random set j=1 TP,Q (X | Zj ):

J(P,Q)
N (X | Z ),
 S
1, if x ∈
/ TP,Q

M j
ψ(x) = j=1

0, otherwise.
It is sufficient to show that for N large enough a selection exists such

that
 
 X 
Pr ψ(x) < 1 > 0,
 N

x∈TP (X)
since it is equivalent to the existence of the required covering. We have

   
 X   J(P,Q) 
[
ψ(x) ≥ 1 ≤ TPN (X) Pr x ∈ N

Pr / TP,Q (X | Zj ) .
   
N
x∈TP (X) j=1
Taking into account the independence of RVs Zj , j = 1, Jk (P, Q) and

the polynomial estimate for the number of conditional types (see
Lemma 1.1) we have for N large enough
 
 J(P,Q) 
[
N
Pr x ∈/ TP,Q (X | Zj )
 
j=1
N −1 J(P,Q)
bj ) TPN (X)

≤ 1 − TP,Q (X | x
b − |X ||Xb| log(N + 1) − N HP (X)})J(P,Q)

≤ (1 − exp{N HP,Q (X | X)
b + ε/2)})J(P,Q) ,
≤ (1 − exp{−N (IP,Q (X ∧ X)
where
ε ≥ N −1 |X ||Xb| log(N + 1).

Applying the inequality (1 − t)s ≤ exp{−st} (which holds for every s
and 0 < t < 1) for
t = exp{−N (IP,Q (X ∧ X)
b + ε/2)}, and s = J(P, Q)
we continue the estimation

 
 X 
Pr ψ(x) ≥ 1
 N

x∈TP (X)
≤ exp{N HP (X)} exp{−J(P, Q) exp{−N (IP,Q (X ∧ X)

b + ε/2)}}
= exp{N HP (X) − exp{N ε/2}}.

Whence, when N is large enough, then
 
 X 
Pr ψ(x) ≥ 1 < 1.
 N

x∈TP (X)
Now we are ready to expose the proof of Theorem 5.4.

Proof. The proof of the inequality
R0 (E, ∆, P ∗ ) ≥ R(E, ∆, P ∗ ) (5.21)
is based upon the idea of “importance” of the source messages of such
types P which are not farther than E (in the sense of divergence) from
the generating PD P ∗ .
Let us represent the set of all source messages of length N as a
union of all disjoint types of vectors:
[
XN = TPN (X). (5.22)
P ∈PN (X )
Let δ be a positive number. For N large enough we can estimate

the probability of appearance of the source sequences of types beyond
α(E + δ, P ∗ ):
 
[ X
P ∗N  TPN (X) = P ∗N (TPN (X))
P ∈α(E+δ,P
/ ∗) P ∈α(E+δ,P
/ ∗)

|X | ∗
≤ (N + 1) exp −N min D(P k P )
P ∈α(E+δ,P
/ ∗)
≤ exp{−N E − N δ + |X | log(N + 1)} ≤ exp{−N (E + δ/2)}.

(5.23)
Here the first and the second inequalities follow from the well
known properties of types and the definition of the set α(E, P ∗ ). Con-
sequently, in order to obtain the desired level for the error probability
e(f, g, P ∗ , ∆, N ), it is sufficient to construct the “good” encoding func-
tion for the vectors with types P from α(E + δ, P ∗ ).
Let us pick some type P ∈ α(E + δ, P ∗ ) and some QP ∈ Q(P, ∆).
Let
[
N N
C(P, QP , j) = TP,Q P
(X | x
b j ) − TP,Q P
(X | x
bj 0 ), j = 1, J(P, QP ).
j 0 <j
We define a code (f, g) for the message vectors of the type P with the
encoding:
j, when x ∈ C(P, QP , j), P ∈ α(E + δ, P ∗ ),
(
f (x) =
j0 , when x ∈ TPN (X), P ∈ / α(E + δ, P ∗ ),
and the decoding:
g(j) = x
bj , g(j0 ) = x
b0 ,
where j0 is a fixed number and x b0 is a fixed reconstruction vector. So,
it is not difficult to see that in our coding scheme an error occurs only
when j0 was sent by the coder.
According to the definition of the code (f, g), to Lemma 5.5 and the
inequality (5.12) we have for P ∈ α(E + δ, P ∗ )
X
d(x, xbj ) = N −1 b | x, x
n(x, x bj )d(x, x
b)
b
x,x
X
= x | x)d(x, x
P (x)QP (b b)
b
x,x
b ≤ ∆,
= EP,QP d(X, X) j = 1, J(P, QP ).
For a fixed type P and a corresponding conditional type QP the number
b used in the encoding, denoted by LP,QP (N ), is
of vectors x
LP,QP (N ) = exp{N (IP,QP (X ∧ X)
b + ε)}.
The “worst” types among α(E + δ, P ∗ ), the number of which has a

polynomial estimate, and their optimal (among Q(P, ∆)) conditional
distributions QP determine corresponding bound for the transmission
rate:
N −1 log LP,QP (N ) − ε − N −1 |X | log(N + 1)
≤ max min IP,QP (X ∧ X).
b (5.24)
P ∈α(E+δ,P ∗ ) QP ∈Q(P,∆)
Taking into account the arbitrariness of ε and δ, the continuity of the

information expression (5.24) in E, we get (5.21).
Now we pass to demonstration of the inverse inequality
R0 (E, ∆, P ∗ ) ≤ R(E, ∆, P ∗ ). (5.25)
Let ε > 0 be fixed. Consider a code (f, g) for each blocklength N
with (E, ∆)-achievable rate R. It is necessary to show that for some
x | x) ∈ Q(P, ∆) the following inequality holds
QP (b
N −1 log L(N ) + ε ≥ max IP,QP (X ∧ X).
b (5.26)
P ∈α(E,P ∗ )
for N large enough.

Let the complement of the set A be A0 , i.e., X N − A = A0 . We can

write down:
\ \
A TPN (X) = TPN (X) − A0 TPN (X) .

For P ∈ α(E − ε, P ∗ ), provided N is large enough, the following esti-

mates hold:
P ∗N (A0 TPN (X))
\ T
0 N
A TP (X) =

P ∗N (x)
≤ exp{N (HP (X) + D(P k P ∗ ))} exp{−N (E − )}
≤ exp{N (HP (X) − ε)}.
Whence
\
A TPN (X) ≥ (N + 1)−|X | exp{N HP (X)} − exp{N (HP (X) − ε)}

exp{N ε}
= exp{N (HP (X) − ε)} −1
(N + 1)|X |
≥ exp{N (HP (X) − ε)} (5.27)
for N sufficiently large.
To each x ∈ A TPN (X) a unique vector x
T
b corresponds, such that
x
b = g(f (b
x)). This vector determines a conditional type Q, for which
N
b ∈ TP,Q
x b | x).
(X
Since x ∈ A, then EP,Q d(X, X) b) ≤ ∆. So Q ∈ Q(P, ∆).

b = d(x, x
T N
The set of all vectors x ∈ A TP (X) is divided into classes corre-
sponding to these conditional types Q. Let us select the class having
maximum cardinality for given P and denote it by
\
A TPN (X) .
QP
Using the polynomial upper estimate (Lemma 1.1) for the number of
conditional types Q, we have for N sufficiently large
\
\
N |X | N

A TP (X) ≤ (N + 1) A TP (X)

QP

\
N

≤ exp{N ε/2} A TP (X)
. (5.28)
QP
Let D be the set of all vectors x

b, such that g(f (x)) = x
b for some
\ \
x ∈ A TPN (X) TP,Q N
P
(X | x
b).
In accordance with the definition of the code |D| ≤ L(N ). Then

\
A TPN (X)

QP
X
N

≤ TP,Q
P
b) ≤ L(N ) exp{N HP,QP (X | X)}.
(X | x b
b∈D
x
From the last inequality, (5.27) and (5.28) it is not difficult to arrive to
the inequality
L(N ) ≥ exp{N (IP,QP (X ∧ X)

b − ε)}
for any P ∈ α(E − ε, P ∗ ). This affirms the inequality (5.26) and hence
(5.25).
5.4 Some Properties and Binary Hamming

Rate-Reliability-Distortion Function
Now we formulate some basic properties of R(E, ∆, P ∗ ) and specify it
for an important class of sources.
In [9] Ahlswede showed that despite the continuity of the Marton’s
exponent function F (P ∗ , R, ∆) in R (where R(∆, P ∗ ) < R < R(∆)) for
Hamming distortion measures, in general, it is discontinuous in R. In
contrast to this fact we have the following result.
Lemma 5.6. R(E, ∆, P ∗ ) is continuous in E.
0
Proof. First note that α(E, P ∗ ) is a convex set, that is if P ∈ α(E, P ∗ )
00
and P ∈ α(E, P ∗ ), then
0 00
D(λP + (1 − λ)P k P ∗ ) ∈ α(E, P ∗ ),
because
0 00 0 00
D(λP + (1 − λ)P k P ∗ ) ≤ λD(P k P ∗ ) + (1 − λ)D(P k P ∗ )
≤ λE + (1 − λ)E = E,
due to the concavity of D(P k P ∗ ) in the pair (P, P ∗ ), which means

that it is a concave function in P for a fixed P ∗ .
Hence, taking into account the continuity of R(∆, P ∗ ) in P ∗ , from
(5.16) the continuity of R(E, ∆, P ∗ ) in E follows.
It is well known [51] that the rate-distortion function is a non-

increasing and convex function in ∆. Let us see that the property of con-
vexity in ∆ is not disturbed for the rate-reliability-distortion function.
Lemma 5.7. R(E, ∆, P ∗ ) is a convex function in ∆.
Proof. For fixed E let the points (∆1 , R1 ) and (∆2 , R2 ) belong to the
curve of R(E, ∆, P ∗ ) and ∆1 ≤ ∆2 . We shall prove that for every λ
from (0, 1),
R(E, λ∆1 + (1 − λ)∆2 , P ∗ ) ≤ λR(E, ∆1 , P ∗ ) + (1 − λ)R(E, ∆2 , P ∗ ).
Consider for any fixed PD P ∗ the rate-distortion function (5.14). Using
the fact that the rate-distortion function R(∆, P ∗ ) is a convex function
in ∆, one can readily deduce
R(E, λ∆1 + (1 − λ)∆2 , P ∗ )
= max R(λ∆1 + (1 − λ)∆2 , P )
P ∈α(E,P ∗ )
≤ max (λR(∆1 , P ) + (1 − λ)R(∆2 , P ))

P ∈α(E,P ∗ )
≤λ max R(∆1 , P ) + (1 − λ) max R(∆2 , P )

P ∈α(E,P ∗ ) P ∈α(E,P ∗ )
= λR(E, ∆1 , P ∗ ) + (1 − λ)R(E, ∆2 , P ∗ ).
In the sequel elaborating the binary Hamming rate-reliability-

distortion function we conclude that R(E, ∆, P ∗ ) is not convex
in E.
However, before dealing with that example, it is relevant also
to draw attention to the properties of the rate-reliability function
R(E, P ∗ ), especially in the argument E. In this regard we have the
following lemma.
Lemma 5.8. R(E, P ∗ ) is concave in E for a fixed PD P ∗ .
Proof. In (5.17) the maximization is taken over the convex set α(E, P ∗ ).
From the concavity of HP (X) in P , it follows that the maximum is
attained at the boundaries of α(E, P ∗ ), that is at those P for which
D(P k P ∗ ) = E, unless the equiprobable PD (1/|X |, 1/|X |, . . . , 1/|X |) ∈
/
∗
α(E, P ), where the entropy attains its maximum value.
Let E1 and E2 be arbitrary values from the definitional domain of
R(E, P ∗ ), i.e., Ei ∈ (0, ∞), with 0 < E1 < E2 , i = 1, 2. And let
R(E1 , P ∗ ) = HPE1 (X), (5.29)
and
R(E2 , P ∗ ) = HPE2 (X) (5.30)

be the values of the rate-reliability function for E1 and E2 , respec-
tively, where PEi , i = 1, 2, are those PDs that maximize the entropy in
(5.17) over α(Ei , P ∗ ), correspondingly, and therefore satisfy the condi-
tion D(PEi k P ∗ ) = Ei .
For a real λ, 0 < λ < 1, taking into account (5.29) and (5.30), the
following chain of reasonings yields the desirable result.
R(λE1 + (1 − λ)E2 , P ∗ ) = max HP (X)
P ∈α(λE1 +(1−λ)E2 ,P ∗ )
= HPλE1 +(1−λ)E2 (X)
(a)
≥ HλPE1 +(1−λ)PE2 (X)
(b)
≥ λHPE1 (X) + (1 − λ)HPE2 (X)
= λR(E1 , P ∗ ) + (1 − λ)R(E2 , P ∗ ),
where D(PλE1 +(1−λ)E2 k P ∗ ) = λE1 + (1 − λ)E2 and the inequality (a)
follows from the inequality
D(λPE1 + (1 − λ)PE2 k P ∗ ) ≤ λD(PE1 k P ∗ ) + (1 − λ)D(PE2 k P ∗ )
= λE1 + (1 − λ)E2 .
and the inequality (b) follows from the concavity of the entropy. The
lemma is proved.
Now let us turn to the consideration of an example promised above.

For the binary source X with X = {0, 1}, PD P ∗ = {p∗ , 1 − p∗ } and
Hamming distance

M 0, if x = xb,
d(x, x
b) =
1, if x 6= x
b,
we denote the rate-distortion function and the rate-reliability-distortion
function by RBH (∆, P ∗ ) and RBH (E, ∆, P ∗ ), respectively.
Let H∆ (X), 0 ≤ ∆ ≤ 1, be the following binary entropy function
M
H∆ (X) = −∆ log ∆ − (1 − ∆) log(1 − ∆).
It is known (see [26, 48]) that
0 ≤ ∆ ≤ min{p∗ , 1 − p∗ },
(
HP ∗ (X) − H∆ (X),
RBH (∆, P ∗ ) =
0, ∆ > min{p∗ , 1 − p∗ }.
(5.31)
Using the following shortening,
M
pmax = max min{p, 1 − p},
P ∈α(E,P ∗ )
the result below describes the binary Hamming rate-reliability-

distortion function analytically [114].
Theorem 5.9. For every E > 0 and ∆ ≥ 0,

∗ / [α , α ], ∆ ≤ p

HPE (X) − H∆ (X), if p ∈ 1 2 max ,
∗ ∗
RBH (E, ∆, P ) = 1 − H∆ (X), if p ∈ [α1 , α2 ], ∆ ≤ pmax ,

0, if ∆ > pmax ,
(5.32)
where
" √ √ #
M 2E − 22E − 1 2E + 22E − 1
[α1 , α2 ] = ,
2E+1 2E+1

1 p
−2E
1 p
−2E
= 1− 1−2 , 1+ 1−2
2 2
and PE = {pE , 1 − pE } is a PD on X that results in D(PE k P ∗ ) = E.
Proof. From (5.16) and (5.31) it is apparent that

(
maxP ∈α(E,P ∗ ) (HP (X) − H∆ (X)), ∆ ≤ pmax ,
RBH (E, ∆, P ∗ ) =
0, ∆ > pmax .
Let 0 ≤ ∆ ≤ pmax . We need to simplify the expression
max (HP (X) − H∆ (X)) = max HP (X) − H∆ (X).

P ∈α(E,P ∗ ) P ∈α(E,P ∗ )
Note that if PD {1/2, 1/2} ∈ α(E, P ∗ ) then
max HP (X) = 1,
P ∈α(E,P ∗ )

1
which holds when 2 log 2p1∗ + log 2(1−p
1
∗) ≤ E or
p∗ (1 − p∗ ) ≥ 2−2(E+1) . (5.33)
The condition (5.33) may be rewritten as a quadratic inequality
p∗2 − p∗ + 2−2(E+1) ≤ 0,
which is equivalent to p∗ ∈ [α1 , α2 ]. Consequently, the value of

RBH (E, ∆, P ∗ ) is constant and equals to 1 − H∆ (X) for all PDs from
[α1 , α2 ], which with E → 0 tends to the segment [0, 1].
Now consider the case when p∗ ∈ / [α1 , α2 ]. We show that
max HP (X) = HPE (X), (5.34)

P ∈α(E,P ∗ )
where PE = {pE , 1 − pE } is the solution of D(PE k P ∗ ) = E, assuming

pE the value nearest to 1/2.
The equation (5.34) will be true by the following argument.
Lemma 5.10. The function

p 1−p
D(P k P ∗ ) = p log + (1 − p) log
p∗ 1 − p∗
is a monotone function of p for P ∈ α(E, P ∗ ) with p belonging to
min{p∗ , 1 − p∗ }, 12 .

Proof. Let P1 = {p1 , 1 − p1 } and P2 = {p2 , 1 − p2 } be some binary PDs

from α(E, P ∗ ) and p∗ ≤ p1 ≤ p2 ≤ 12 . It is required to prove the inequal-
ity D(P2 k P ∗ ) ≥ D(P1 k P ∗ ). Since the set α(E, P ∗ ) is convex, then we
can represent P1 = λP ∗ + (1 − λ)P2 (for some 0 < λ < 1) and as in the
proof of Lemma 5.6 we obtain
D(P1 k P ∗ ) ≤ D(λP ∗ + (1 − λ)P2 k P ∗ )

≤ λD(P ∗ k P ∗ ) + (1 − λ)D(P2 k P ∗ )
= (1 − λ)D(P2 k P ∗ ) ≤ D(P2 k P ∗ ).
Therefore, the lemma is proved (at the same time proving that pmax =
pE ) and (5.34) is obtained, which gives us (5.32).
An interesting and important fact to observe is that Theorem 5.9

and (5.32) show that for every binary PD P ∗ and distortion level ∆ ≥ 0
there exists a reliability value Emax > 0 at which RBH (E, ∆, P ∗ ) reaches
its possible maximum value 1 − H∆ (X) and remains constant after-
wards. It means that there is a reasonable demand of the receiver on
the reliability. Higher requirements on it can be satisfied asymptotically
by the same code rate. This constant is the value of the rate-distortion
function for binary equiprobable source and Hamming distortion mea-
sure with condition ∆ ≤ min{p∗ , 1 − p∗ }. Equation (5.19) arguments
our observation.
Some calculations and pictorial illustrations of the rate-reliability-
distortion function for the case considered in this section and more
general ones can be found in the next section, where we discuss mul-
titerminal configurations. In particular, Figures 6.2 and 6.6 represent
typical plots for RBH (E, ∆, P ∗ ).
Theorem 5.11. R(E, ∆, P ∗ ) is not convex in E.
Proof. The statement immediately follows from the characterization

of the rate-reliability-distortion function for the binary Hamming case
(5.32): for any P ∗ one can easily choose a distortion level ∆ > pE such
that RBH (E, ∆, P ∗ ) be constantly zero on some interval of values of the
reliability E. The corresponding graphical representations in Section

6.1 will add to this logic a visual clearness.
The following notice introduces a highly pertinent issue to the con-

vexity analysis in rate-reliability-distortion area.
Remark 5.1. The time-sharing argument, that very efficient theoreti-

cal tool in rate-distortion theory, fails in rate-reliability-distortion one.
In other words, it is not true that “the convex combinations of two
points on (E, ∆)-achievable rates’ hyperplane is (E, ∆)-achievable.”
Figure 6.6 stands for an one-dimensional illustration of that
claim.
Nevertheless, the concavity of the binary Hamming rate-reliability-

distortion function RBH (E, ∆, P ∗ ) in the reliability argument on the
interval of its positiveness can be proved. That interval, under certain
conditions, can coincide with the whole domain (0, ∞) of definition of
this function.
Let Einf (∆) be the infimum of reliability values E for which ∆ ≤ pE
for given P ∗ and distortion level ∆ ≥ 0.
Lemma 5.12. RBH (E, ∆, P ∗ ) is concave in E on interval of its posi-

tiveness (Einf (∆), ∞).
Proof. First note that the condition ∆ ≤ pE provides the positiveness of

the function RBH (E, ∆, P ∗ ) on (Einf (∆), ∞). For a fixed ∆ ≥ 0 and P ∗
there exists a value Emax of the reliability such that if E ≥ Emax , then
RBH (E, ∆, P ∗ ) is constant and equals to 1 − H∆ (X). Since 1 − H∆ (X)
is the maximal value of the binary Hamming rate-reliability-distortion
function, it remains to prove the concavity of RBH (E, ∆, P ∗ ) on the
interval (Einf (∆), Emax ].
Let 0 < E1 < E2 and Ei ∈ (Einf (∆), Emax ], i = 1, 2. It follows from
(5.32) that
RBH (E1 , ∆, P ) = HPE1 (X) − H∆ (X),

RBH (E2 , ∆, P ) = HPE2 (X) − H∆ (X),
where D(PE1 k P ∗ ) = E1 , D(PE2 k P ∗ ) = E2 .

M
Putting Eλ = λE1 + (1 − λ)E2 , the proof looks almost identical to
that of Lemma 5.8, videlicet, for 0 < λ < 1 we have
RBH (Eλ , ∆, P ∗ ) = max HP (X) − H∆ (X)

P ∈α(Eλ ,P ∗ )
= HPEλ (X) − H∆ (X)

(a)
≥ HλPE1 +(1−λ)PE2 (X) − H∆ (X)
(b)
≥ λHPE1 (X) + (1 − λ)HPE2 (X) − H∆ (X)
= λHPE1 (X) + (1 − λ)HPE2 (X) − λH∆ (X)
− (1 − λ)H∆ (X)
= λ(HPE1 (X) − H∆ (X)) + (1 − λ)
× (HPE2 (X) − H∆ (X))
= λRBH (E1 , ∆, P ∗ ) + (1 − λ)RBH (E2 , ∆, P ∗ ),
where again D(PEλ k P ∗ ) = Eλ and the inequalities (a) and (b)

are valid due to the same arguments as those in the proof of
Lemma 5.8.
Note that, in particular, Einf (∆) can take on the value 0, providing
the concavity of the binary Hamming rate-reliability-distortion func-
tion RBH (E, ∆, P ∗ ) on the whole domain (0, ∞). The explicit form
(5.32) of the function allows us to conclude that it always holds when
RBH (∆, P ∗ ) > 0. And when RBH (∆, P ∗ ) = 0 it holds under the condi-
tion RBH (E, ∆, P ∗ ) > RBH (∆, P ∗ ) for all values of E from (0, ∞). We
will turn to some illustrations on this point in an example concerning
the robust descriptions system elaborated in Section 6.1 of the next
chapter.
5.5 Reliability Criterion in AVS Coding

The subject of the section is the error exponent criterion in lossy coding
for a more general class of sources — arbitrarily varying sources (AVS).
The paper [142] by Harutyunyan and Han Vinck solves the problem of
specifying the AVS rate-reliability-distortion function and its dual AVS
maximum error exponent one in the setting of Section 5.2. The formu-
las derived there constitute more general results implying the main
formulas discussed in the previous sections of this review. Particularly,
in zero-distortion case, these formulas specialize the ones derived by Fu
and Shen [77] from their hypothesis testing analysis for AVS.
The model of AVS is more general than that of DMS. In the first
one, the source outputs distribution depends on the source state. The
latter varies within a finite set from one time instant to the next in an
arbitrary manner.
Let X and S be finite sets, X assigned for the source alphabet and
S for the source states, and let
M
P = {Ps , s ∈ S}
be a family of discrete PDs
M
Ps = {P (x | s) : x ∈ X }
on X . The probability of an x ∈ X N subject to a consecutive source
states s is determined by
N
M
Y
P N (x | s) = P (xi | si ).
n=1
Therefore, the AVS defined by P is a sequence of RVs {Xi }∞ i=1 for
which the PD of N -length RV X is an unknown element from P N ,
N
N th Cartesian product of P. Quite apparently, at each source state

s ∈ S the AVS is a DMS with PD Ps .
The problem statement for the AVS coding subject to fidelity and
reliability criteria is totally identical to the DMS coding problem under
the same constraints, with only difference in the source model. That’s
why bear in mind those details explained in Section 5.2, such as the
definitions of a distortion measure, a code (f, g), we just characterize
the latter’s probability of error as
M
e(f, g, ∆) = 1 − min P N (A | s),
s∈S N
when assuming the source restoration within a ∆ threshold. The
further concepts of (E, ∆)-achievable rates and the AVS rate-reliability-
distortion function based upon the definition of e(f, g, ∆) are the
analogies of DMS ones.
A wide range of problems on the subject of the AVS coding in zero-

distortion case are treated and solved in Ahlswede’s papers [5] and [6].
The AVS coding problem under distortion criterion is also known
as the Berger source coding game [27]. We reclaim the classical result
[51] (Theorem 2.4.3) on the AVS rate-distortion function R(∆) in the
following theorem.
Theorem 5.13. Let the AVS be given by the family P. For any ∆ ≥ 0,
R(∆) = max R(∆, P ), (5.35)
P ∈P̄
where P̄ is the convex hull of P,

( )
M
X X
P̄ = λs Ps , 0 ≤ λs ≤ 1, λs = 1 .
s∈S s∈S
In what follows, we sum up the results from [142] on the error

exponent analysis for AVS coding without proofs.
[142] sets up the AVS rate-reliability-distortion function R(E, ∆).
Theorem 5.14. Let the AVS be given by the family P. For any E > 0
and ∆ ≥ 0,
R(E, ∆) = max R(E, ∆, P ), (5.36)

P ∈P̄
or, equivalently,
R(E, ∆) = max max R(∆, Q). (5.37)
P ∈P̄ Q:D(QkP )≤E
(5.36) is the first general formula which implies several ones after-
wards. Apparently, R(∆, P ) is easily derivable.
Corollary 5.6. As E → 0, R(E, ∆, P ) → R(∆, P ), and the AVS rate-

distortion function (5.35) obtains from (5.36).
The formula for the AVS maximum error exponent function E(R, ∆)
inverse to R(E, ∆) is the second general result.
Theorem 5.15. For every ∆ ≥ 0, the best asymptotic error exponent

E(R, ∆) for AVS coding is given by the formula (cf. the DMS version
(5.9))
E(R, ∆) = inf inf D(Q k P ) (5.38)

P ∈P̄ Q:R(∆,Q)>R
with the assumption
max R(∆, P ) < R ≤ log |X | .

P ∈P̄
In case of ∆ = 0, another impact of Theorem 5.14 on the theory

is for the AVS rate-reliability function RAVS (E) (cf. the DMS version
(5.17)) for given exponent E.
Theorem 5.16. For every E > 0,
RAVS (E) = max max HQ (X). (5.39)

P ∈P̄ Q:D(QkP )≤E
The formula (5.39) represents the analytics of the E-optimal rate

function for the AVS coding obtained by Fu and Shen [77] while treat-
ing the hypothesis testing problem for AVS with exponential-type con-
straint. The counterpart of (5.39), also derived in [77], is the maximum
error exponent EAVS (R) in AVS coding which directly follows from
Theorem 5.15.
Theorem 5.17.
EAVS (R) = min min D(Q k P ) (5.40)

P ∈P̄ Q:HQ (X)≥R
for
max HP (X ) < R ≤ log |X | .

P ∈P̄
The formula (5.40) is easily verifiable from (5.38) and (5.18).

Corollary 5.7. As E → 0 and ∆ = 0, the asymptotically optimal loss-

less coding rate RAVS for AVS (cf. also [5]) from (5.36) is given by
RAVS = max HP (X).
P ∈P̄
Corollary 5.8. R(E, ∆) and R(E, ∆, P ) have the same limit, namely
the zero-error rate-distortion function R̄(∆) (see [51]) as E → ∞. In
particular, (5.36) or (5.37) implies
R̄(∆) = max R(∆, P ).
P ∈P(X )
These were the main issues concerning the AVS coding that we
aimed to survey here. Along the aforecited claims one might convinced
of the generality of the results on the subject of source coding under
distortion and error exponent criteria. They require an effortless spe-
cialization for the DMS models that has been studying in the field more
intensively.
Example 5.1. Let X be a binary Hamming AVS changing its distri-

bution law at each time stamp in an arbitrary “-floating” manner.
M
This model assumes that X follows either the PD P1 = {p, 1 − p} or
M
P2 = {p + , 1 − p − } as a matter of state at which it stays. The rate-
reliability-distortion function RBH (E, ∆) of this kind of sources can be
readily derived from Theorem 5.14 in view of Theorem 5.9. For every
E > 0 and ∆ ≥ 0

 max HQE (X) − H∆ (X), if (a)
Q: p≤q≤p+, D(QE kQ)=E


RBH (E, ∆, ) = 1 − H∆ (X), if (b)



0, if (c)
where
(a) [p, p + ] ∩ [α1 , α2 ] = ∅, ∆ ≤ pmax (),

(b) [p, p + ] ∩ [α1 , α2 ] 6= ∅, ∆ ≤ pmax (),
(c) ∆ > pmax (),
M M
with Q = {q, 1 − q} and QE = {qE , 1 − qE } be binary PD’s on source
alphabet X ,
M
pmax () = max max min{q 0 , 1 − q 0 }
Q: p≤q≤p+ Q0 : D(Q0 kQ)≤E
M
and Q0 = {q 0 , 1 − q 0 } be another binary PD on X .
6
Reliability Criterion in Multiterminal
Source Coding
6.1 Robust Descriptions System

We illustrate the rate-reliability approach in source coding by analysis
of a multiterminal system. The rate-distortion solution for this system
has a simple characterization, which follows from the result for the
ordinary rate-distortion function. But the problem with the reliability
criterion has a non-trivial solution and interesting specific nuances.
We consider a source coding problem with fidelity and reliability
criteria for the robust descriptions system with one encoder and many
decoders. The rate-reliability-distortion function will be specified. An
example will be given showing a distinction in calculation of the rate-
reliability-distortion function and the rate-distortion one. The original
elaboration was published in [121].
A model of multiple descriptions, earlier studied by El Gamal
and Cover [67] and named robust descriptions system, is presented
in Figure 6.1. Some models of multiple description systems have been
considered also in [7, 26, 29, 30, 67, 123, 140, 170, 220, 226].
Messages of a DMS encoded by one encoder must be transmitted
to K different receivers. Each of them, based upon the same codeword,
103
104 Reliability Criterion in Multiterminal Source Coding
Fig. 6.1 Robust descriptions system.
has to restore the original message within a distortion with a reliability

both acceptable to him/her.
As in Section 5 the source X is defined by a sequence {Xi }∞ i=1 of
discrete i.i.d. RVs taking values in the finite alphabet X . The source
generic PD P ∗ and N -length vector probabilities are defined according
to (5.1) and (5.2).
The finite sets X k , k = 1, K, in general different from X and
from each others, are the reproduction alphabets of the correspond-
ing receivers. Let dk : X × X k → [0; ∞) be a single-letter distortion
criterion between the source and kth reconstruction alphabet like
in (5.3), k = 1, K. The distortion measures for sequences x ∈ X N
and xk ∈ X kN of length N are averaged by the components’
distortions
N
M 1 X k
dk (x, xk ) = d (xn , xkn ), k = 1, K.
N
n=1
M
A code (f, g) = (f, g 1 , g 2 , . . . , g K ) is a family of (K + 1) mappings: one
encoder f : X N → {1, 2, . . . , L(N )} that feeds K separate decoders g k :
{1, 2, . . . , L(N )} → X kN , k = 1, K.
The system designer looks for an appropriate coding strategy to
guarantee the recovery of the source messages for each addressee k,
making satisfied the latter’s requirement on distortion not more than
∆k with reliability E k , k = 1, K.
For given distortion levels ∆k ≥ 0, k = 1, K we consider the sets:

M
Ak = {x : g k (f (x)) = xk , dk (x, xk ) ≤ ∆k }, k = 1, K,
and the error probabilities of the code (f, g):

M
ek (f, g k , P ∗ , ∆k , N ) = 1 − P ∗N (Ak ), k = 1, K.
M M
For brevity we denote E = (E 1 , . . . , E K ) and ∆ = (∆1 , . . . , ∆K ).
Now similarly to the conditions (5.5) and (5.6) we define the
achievability of a rate by reliability criterion and rate-reliability-
distortion dependence for this multiterminal system. A number R ≥ 0
is called (E, ∆)-achievable rate for E k > 0, ∆k ≥ 0, k = 1, K, if for
every ε > 0, δ > 0, and sufficiently large N there exists a code (f, g)
such that
1
log L(N ) ≤ R + ε,
N
ek (f, g k , P ∗ , ∆k , N ) ≤ exp{−N (E k − δ)}, k = 1, K.
Denote the rate-reliability-distortion function for this system by

R(E, ∆, P ∗ ), it is the minimum of the (E, ∆)-achievable rates for all
components of E and ∆. Let R(∆, P ∗ ) be the corresponding rate-
distortion function, which was specified by El Gamal and Cover in [67].
R(∆, P ∗ ) is the limit function of R(E, ∆, P ∗ ) as E k → 0, k = 1, K.
We specify the rate-reliability-distortion function [115, 121]. We give
also an example of the calculation of the rate-reliability-distortion func-
tion for the case of binary source with two receivers and Hamming
distortion measures. This example demonstrates some peculiarity of
rate-reliability-distortion function as a function of E and ∆ for differ-
ent particular cases. It is shown that in contradistinction to the case of
the rate-distortion function, where the smallest distortion component
is decisive, for the rate-reliability-distortion one the decisive element
may be the greatest reliability component. Some possible curves for
R(E, ∆, P ∗ ) are illustrated in Figures 6.2–6.6.
Now we formulate the result. Let P = {P (x), x ∈ X } be a PD on
X and
Q = {Q(x1 , . . . , xK | x), x ∈ X , xk ∈ X k , k = 1, K},

Fig. 6.2 (a) RBH (E 1 , ∆1 , P ∗ ) for p∗ = 0.15, ∆1 = 0.1, (b) for p∗ = 0.15, ∆1 = 0.3.
be a conditional PD on X 1 × · · · × X K for given x. We shall also use

the marginal conditional PDs
M
X
Q(xk | x) = Q(x1 , . . . , xK | x), k = 1, K.
xj ∈X j , j=1,K, j6=k
Assume without loss of generality that

0 < E1 ≤ E2 ≤ · · · ≤ EK .
Fig. 6.3 RBH (E 1 , ∆1 , P ∗ ) as a function of E 1 and ∆1 .
Fig. 6.4 RBH (E, ∆, P ∗ ) as a function of ∆1 and ∆2 for P ∗ = (0.15, 0.85), E 1 = 0.09,
E 2 = 0.49.
Remark that then α(E k , P ∗ ) ⊆ α(E k+1 , P ∗ ), k = 1, K − 1. (For defini-

tion of α(E k , P ∗ ) see (5.11).)
Let for each j ∈ {1, 2, . . . , K} the set Qj (P, ∆) be the collection of
those conditional PDs QP (x1 , . . . , xK | x) for which the given P satisfies
the following conditions:
X
EP,QP dk (X, X k ) = P (x)QP (xk | x)dk (x, xk ) ≤ ∆k , k = j, K.
x, xk
Fig. 6.5 RBH (E, ∆, P ∗ ) as a function of E 1 and E 2 for P ∗ = (0.15, 0.85), ∆1 = 0.1,
∆2 = 0.13.
Fig. 6.6 RBH (E, ∆, P ∗ ) for P ∗ = (0.15, 0.85), ∆1 = 0.1, ∆2 = 0.13, E 1 = 0.09.
The function R∗ (E, ∆, P ∗ ) determined as follows:

h
M
R∗ (E, ∆, P ∗ ) = max max min IP,QP (X ∧ X 1 , . . . , X K ),
P ∈α(E 1 ,P ∗ ) QP ∈Q1 (P,∆)
max min IP,QP (X ∧ X 2 , . . . , X K ), · · ·

P ∈α(E 2 ,P ∗ )−α(E 1 ,P ∗ ) QP ∈Q2 (P,∆)
i
max min IP,QP (X ∧ X K ) .
P ∈α(E K ,P ∗ )−α(E K−1 ,P ∗ ) QP ∈QK (P,∆)
helps to state the rate-reliability-distortion function for the robust

descriptions problem.
Theorem 6.1. For every E k > 0, ∆k ≥ 0, k = 1, K,
R(E, ∆, P ∗ ) = R∗ (E, ∆, P ∗ ).
Corollary 6.1. When E k = E, k = 1, K,
R(E, ∆, P ∗ ) = max min IP,QP (X ∧ X 1 , . . . , X K ).

P ∈α(E,P ∗ ) QP ∈Q1 (P,∆)
Corollary 6.2. (El Gamal and Cover [67].) As E → 0, we obtain the

rate-distortion function
R(∆, P ∗ ) = min IP ∗ ,QP ∗ (X ∧ X 1 , . . . , X K ).

QP ∗ ∈Q1 (P ∗ ,∆)
The proof of Theorem 6.1 may be found in [121], it is a natural

extension of the proof for Theorem 5.4. The direct part of that the-
orem is proved by constructing hierarchical type coverings applying
Lemma 5.5.
Exploration of an Example. We specify the rate-reliability-distortion
function R(E, ∆) for the robust descriptions system of Figure 6.1
for the case of binary source with K = 2 and Hamming distortion
measures.
The binary source is characterized by the alphabets X = X 1 =
X 2 = {0, 1}, with generic PD P ∗ = (p∗ , 1 − p∗ ) and Hamming distor-
tion measures
( (
1 1
0, for x = x1 , 2 2
0, for x = x2 ,
d (x, x ) = d (x, x ) =
1, for x 6= x1 , 1, for x 6= x2 .
The binary Hamming rate-reliability-distortion function is denoted
by RBH (E, ∆, P ∗ ) and by RBH (∆, P ∗ ) — the binary Hamming rate-
distortion one for the robust descriptions [121]. They are the extensions
of the corresponding functions treated in Section 5.4.
For E 1 such that (1/2, 1/2) ∈ α(E 1 , P ∗ ) Theorem 5.9 results in the
following reduction
(
1 1 ∗
1 − H∆1 (X), when ∆1 ≤ 1/2,
RBH (E , ∆ , P ) = (6.1)
0, when ∆1 > 1/2.
Let PE 1 = (pE 1 , 1 − pE 1 ) where pE 1 is the nearest to 1/2 solution

of the equation
D(PE 1 k P ∗ ) = E 1 .
PE 2 is defined analogously.
/ α(E 1 , P ∗ )
For E 1 such that (1/2, 1/2) ∈
HPE1 (X) − H∆1 (X), when ∆1 ≤ pE 1 ,

1 1 ∗
RBH (E , ∆ , P ) = (6.2)
0, when ∆1 > pE 1 .
As we have seen in Section 5, RBH (E 1 , ∆1 , P ∗ ) as a function of ∆1

is concave, as a function of E 1 for ∆1 such that RBH (∆1 , P ∗ ) > 0 it
is convex (see Figure 6.2(a)). For ∆1 such that RBH (∆1 , P ∗ ) = 0 it is
convex when E 1 > E 1 (∆1 ), where E 1 (∆1 ) is the minimal E 1 , for which
∆1 ≤ pE 1 (see Figure 6.2(b)). RBH (E 1 , ∆1 , P ∗ ) as a function of E 1 and
∆1 is illustrated in Figure 6.3.
Consider the case of ∆1 < ∆2 . It is not difficult to verify that in
this case the binary Hamming rate-distortion function RBH (∆, P ∗ ) is
determined by the demand of the receiver with the smallest distortion
level. Therefore, we have
RBH (∆, P ∗ ) = RBH (∆1 , P ∗ )

(
HP ∗ (X) − H∆1 (X), if ∆1 ≤ min(p∗ , 1 − p∗ ),
=
0, if ∆1 > min(p∗ , 1 − p∗ ).
For the case of E 1 < E 2 Theorem 5.9 gives
RBH (E, ∆, P ∗ )

= max max min IP,QP (X ∧ X 1 , X 2 ),
P ∈α(E 1 ,P ∗ ) QP ∈Q1 (P,∆)

2
max min IP,QP (X ∧ X ) .
P ∈α(E 2 ,P ∗ )−α(E 1 ,P ∗ ) QP ∈Q2 (P,∆)
By analogy with the calculation of the binary Hamming rate-reliability-

distortion function RBH (E 1 , ∆1 , P ∗ ) we have the following equalities.
For E 1 such that (1/2, 1/2) ∈ α(E 1 , P ∗ )
max min IP,QP (X ∧ X 1 , X 2 )

P ∈α(E 1 ,P ∗ ) QP ∈Q1 (P,∆)
(
1 − H∆1 (X), when ∆1 ≤ 1/2,
=
0, when ∆1 > 1/2.
/ α(E 1 , P ∗ )
max min IP,QP (X ∧ X 1 , X 2 )

P ∈α(E 1 ,P ∗ ) QP ∈Q1 (P,∆)
(
HPE1 (X) − H∆1 (X), when ∆1 ≤ pE 1 ,
=
0, when ∆1 > pE 1 .
For E 2 such that (1/2, 1/2) ∈ α(E 2 , P ∗ ) − α(E 1 , P ∗ )
max min IP,QP (X ∧ X 2 )

P ∈α(E 2 ,P ∗ )−α(E 1 ,P ∗ ) QP ∈Q2 (P,∆)
(
1 − H∆2 (X), when ∆2 ≤ 1/2,
=
0, when ∆2 > 1/2.
/ α(E 2 , P ∗ ) − α(E 1 , P ∗ )
max min IP,QP (X ∧ X 2 )

P ∈α(E 2 ,P ∗ )−α(E 1 ,P ∗ ) QP ∈Q2 (P,∆)
(
HPE2 (X) − H∆2 (X), when ∆2 ≤ pE 2 ,
=
0, when ∆2 > pE 2 .
RBH (E, ∆, P ∗ ) as a function of ∆1 and ∆2 (for fixed E 1 , E 2 ) is

illustrated in Figure 6.4
An example of RBH (E, ∆, P ∗ ) as a function of E 1 , E 2 (for fixed ∆1 ,
2
∆ ) is shown in Figure 6.5.
It is apparent to see that RBH (E, ∆, P ∗ ) as a function of E 2 when
(1/2, 1/2) ∈ α(E 1 , P ∗ ) is constant and equals to its possible maximum
value 1 − H∆1 (X). Let us assume that
∆1 < min(p∗ , 1 − p∗ ). (6.3)
Then RBH (∆, P ∗ ) = HP ∗ (X) − H∆1 (X).

Consider the following two cases for fixed ∆1 , satisfying (6.3) and
/ α(E 1 , P ∗ ), ∆1 < pE 1 .
E 1 such that (1/2, 1/2) ∈
For E such that (1/2, 1/2) ∈ α(E 2 , P ∗ ), ∆2 < 21 , we obtain
2
RBH (E, ∆, P ∗ ) = max[HPE1 (X) − H∆1 (X), 1 − H∆2 (X)],
and when
HPE1 (X) − H∆1 (X) < 1 − H∆2 (X), (6.4)
then
RBH (E, ∆, P ∗ ) = RBH (E 2 , ∆2 , P ∗ ) = 1 − H∆2 (X).
/ α(E 2 , P ∗ ), ∆2 < pE 2 , we get

RBH (E, ∆, P ∗ ) = max[HPE1 (X) − H∆1 (X), HPE2 (X) − H∆2 (X)],
and when
HPE1 (X) − H∆1 (X) < HPE2 (X) − H∆2 (X),
then
RBH (E, ∆, P ∗ ) = HPE2 (X) − H∆2 (X).
Note that RBH (E, ∆, P ∗ ) as a function of E 2 when (1/2, 1/2) ∈ /

1 ∗
α(E , P ) and (6.4) does not hold is constant and equals to HPE1 (X) −
H∆1 (X).
RBH (E, ∆, P ∗ ) as a function of E 2 when (1/2, 1/2) ∈
/ α(E 1 , P ∗ ) and
(6.4) is valid, is presented in Figure 6.6.
6.2 Cascade System Coding Rates with Respect

to Distortion and Reliability Criteria
In this section, we discuss a generalization [122] of the multiterminal
communication system first studied by Yamamoto [223]. Messages of
Fig. 6.7 Cascade communication system.
two correlated sources X and Y coded by a common encoder and a

separate encoder must be transmitted to two receivers by two com-
mon decoders within prescribed distortion levels ∆1x , ∆1y and ∆2x , ∆2y ,
respectively (see Figure 6.7).
The region R(E 1 , E 2 , ∆1x , ∆1y , ∆2x , ∆2y , P ∗ ) of all achievable rates of
the best codes ensuring reconstruction of messages of the sources X
and Y within given distortion levels with error probabilities exponents
E 1 and E 2 at the first and second decoders, respectively, is illustrated.
Important consequent cases are noted.
The problem is a direct generalization of the problem considered by
Yamamoto.
Let {Xi , Yi }∞
i=1 be a sequence of discrete i.i.d. pairs of RVs, taking
values in the finite sets X and Y, which are the sets of all messages of
the sources X and Y , respectively. Let the generic joint PD of messages
of the two sources be
P ∗ = {P ∗ (x, y), x ∈ X , y ∈ Y}.
For memoryless sources the probability P ∗N (x, y) of a pair

of N -sequences of messages x = (x1 , x2 , . . . , xN ) ∈ X N and y =
(y1 , y2 , . . . , yN ) ∈ Y N is defined by
N
Y
P ∗N (x, y) = P ∗N (xn , yn ).
n=1
The sets X 1 , X 2 and Y 1 , Y 2 are reconstruction alphabets and in general

they are different from X and Y, respectively. Let
d1x : X × X 1 → [0, ∞) , d2x : X × X 2 → [0, ∞) ,

d1y : Y × Y 1 → [0, ∞) , d2y : Y × Y 2 → [0, ∞) ,
be corresponding distortion measures. The distortion measures for

N -sequences are defined by the respective average of per letter
distortions
N
X N
X
d1x (x, x1 ) = N −1 d1x (xn , x1n ), d2x (x, x2 ) = N −1 d2x (xn , x2n ),
n=1 n=1
XN N
X
d1y (y, y1 ) = N −1 d1y (yn , yn1 ), d2y (y, y2 ) = N −1 d2y (yn , yn2 ),
n=1 n=1
where x ∈X N , y ∈Y N , x1 ∈ X 1N , x2 ∈ X 2N , y1 ∈ Y 1N , y2 ∈ Y 2N .
For the considered system the code (f, fb, g, gb) is a family of four
mappings:
f : X N × Y N → {1, 2, . . . , K (N )},
fb : {1, 2, . . . , K (N )} → {1, 2, . . . , L (N )},
g : {1, 2, . . . , K (N )} → X 1N × Y 1N ,
gb : {1, 2, . . . , L (N )} → X 2N × Y 2N .
Define the sets
M
Ai = {(x, y) : g(f (x, y)) = (x1 , y1 ), gb(fb(f (x, y)))
= (x2 , y2 ), dix (x, xi ) ≤ ∆ix , diy (y, yi ) ≤ ∆iy }, i = 1, 2.
For given levels of admissible distortions ∆1x ≥ 0, ∆2x ≥ 0, ∆1y ≥ 0,
∆2y ≥ 0 the error probabilities of the code are
ei (f, fb, g, gb, P ∗ , ∆ix , ∆iy ) = 1 − P ∗n (Ai ) , i = 1, 2.

M M
Again, for brevity we denote E = (E 1 , E 2 ) and ∆ = (∆1x , ∆1y , ∆2x , ∆2y ).
A pair of two nonnegative numbers (R, R) b is said to be a (E, ∆)-
achievable rates pair for E 1 > 0, E 2 > 0, ∆1x ≥ 0, ∆2x ≥ 0, ∆1y ≥ 0,
∆2y ≥ 0, if for arbitrary ε > 0, δ > 0, and N sufficiently large there exists
a code (f, fb, g, gb) such that
N −1 log K (N ) ≤ R + ε, N −1 log L (N ) ≤ R
b + ε,
and
ei (f, fb, g, gb, P ∗ , ∆ix , ∆iy ) ≤ exp{−N (Ei − δ)}, i = 1, 2. (6.5)
Let R(E, ∆, P ∗ ) be the set of all pairs of (E, ∆)-achievable rates and
R(∆, P ∗ ) be the corresponding rate-distortion region. If at the first
decoder only the messages of the source X and at the second decoder
only the messages of the source Y are reconstructed, then R(∆, P ∗ )
becomes the set of all (∆1x , ∆2y )-achievable rates pairs R1 (∆1x , ∆2y , P ∗ )
studied by Yamamoto in [223].
We present the rate-reliability-distortion region R(E, ∆, P ∗ ) with-
out proof.
Let P = {P (x, y), x ∈ X , y ∈ Y} be some PD on X × Y and
Q = {Q(x1 , y 1 , x2 , y 2 | x, y), x ∈ X , y ∈ Y, x1 ∈ X 1 ,
y 1 ∈ Y 1 , x2 ∈ X 2 , y 2 ∈ Y 2 }
be conditional PD on X 1 × Y 1 × X 2 × Y 2 for given x and y.

We proceed with the notations and definitions taking into account
two possible cases regarding the reliability constraints.
(1) E 1 ≤ E 2 . Let Q(P, E, ∆) be the set of those conditional PDs

QP (x1 , y 1 , x2 , y 2 | x, y) for which if P ∈ α(E 1 , P ∗ ) the follow-
ing conditions are satisfied
EP,QP d1x (X, X 1 )

X
= P (x, y)QP (x1 , y 1 , x2 , y 2 | x, y)d1x (x, x1 )
x,y,x1 ,y 1 ,x2 ,y 2
≤ ∆1x , (6.6)
EP,QP d1y (Y, Y 1 )
X
= P (x, y)QP (x1 , y 1 , x2 , y 2 | x, y)d1y (y, y 1 )
x,y,x1 ,y 1 ,x2 ,y 2
≤ ∆1y , (6.7)
EP,QP d2x (X, X 2 )
X
= P (x, y)QP (x1 , y 1 , x2 , y 2 | x, y)d2x (x, x2 )
x,y,x1 ,y 1 ,x2 ,y 2
≤ ∆2x , (6.8)
EP,QP d2y (Y, Y 2 )

X
= P (x, y)QP (x1 , y 1 , x2 , y 2 | x, y)d2y (y, y 2 )
x,y,x1 ,y 1 ,x2 ,y 2
≤ ∆2y , (6.9)
otherwise, if P ∈ α(E 2 , P ∗ )
− α(E 1 , P ∗ ),
then only (6.8) and
(6.9) hold.
(2) E 2 ≤ E 1 . If P ∈ α(E 2 , P ∗ ), let the set Q(P, E, ∆) consists
of those conditional PDs QP which make the inequalities
(6.6)–(6.9) valid, otherwise, if P ∈ α(E 1 , P ∗ ) − α(E 2 , P ∗ ),
then only (6.6) and (6.7) are satisfied. When E 1 = E 2 = 0,
the notation Q(P ∗ , ∆) specializes Q(P, E, ∆).
Introduce the following sets:
(1) for E 1 ≤ E 2 ,
\ [ n
R1 (E, ∆, P ∗ ) = (R, R)
b :
P ∈α(E 1 ,P ∗ ) QP ∈Q(P,E,∆)
R ≥ IP,QP (X, Y ∧ X 1 , Y 1 , X 2 , Y 2 ),
o
b ≥ IP,Q (X, Y ∧ X 2 , Y 2 ) ,
R P
R2 (E, ∆, P ∗ )
\ [ n
= b : R ≥ R,
(R, R) b
P ∈α(E 2 ,P ∗ )−α(E 1 ,P ∗ ) QP ∈Q(P,E,∆)
o
b ≥ IP,Q (X, Y ∧ X 2 , Y 2 ) ,
R P
(2) for E 2 ≤ E 1 ,
R3 (E, ∆, P ∗ )
\ [ n
= (R, R)
b :
P ∈α(E 2 ,P ∗ ) QP ∈Q(P,E,∆)
R ≥ max[IP,QP (X, Y ∧ X 1 , Y 1 , X 2 , Y 2 ),
max min IP,QP (X, Y ∧ X 1 , Y 1 )],
P ∈α(E 1 ,P ∗ )−α(E 2 ,P ∗ ) QP ∈Q(P,E,∆)
o
b ≥ IP,Q (X, Y ∧ X 2 , Y 2 ) .
R P
Theorem 6.2. For every 0 < E 1 ≤ E 2 , ∆ix ≥ 0, ∆iy ≥ 0, i = 1, 2,

\
R(E, ∆, P ∗ ) = R1 (E, ∆, P ∗ ) R2 (E, ∆, P ∗ ).
For every 0 < E 2 ≤ E 1 , ∆ix ≥ 0, ∆iy ≥ 0, i = 1, 2,

R(E, ∆, P ∗ ) = R3 (E, ∆, P ∗ ).
Corollary 6.3. When E 1 = E 2 = E, we obtain R0 (E, ∆, P ∗ ):

M
\ [
R0 (E, ∆, P ∗ ) =

(R, R)
b :
P ∈α(E,P ∗ ) QP ∈Q(P,E,∆)
R ≥ IP,QP (X, Y ∧ X 1 , Y 1 , X 2 , Y 2 ),
b ≥ IP,Q (X, Y ∧ X 2 , Y 2 ) .

R P
As E → 0 we derive the rate-distortion dependence R(∆, P ∗ ):

[
R(∆, P ∗ ) =

(R, R)
b :
QP ∗ ∈Q(P ∗ ,∆)
R ≥ IP ∗ ,QP ∗ (X, Y ∧ X 1 , Y 1 , X 2 , Y 2 ),
b ≥ IP ∗ ,Q ∗ (X, Y ∧ X 2 , Y 2 ) .

R P
Corollary 6.4. If the first decoder recovers only the messages of the
source X and the second decoder recovers only the messages of Y , we
obtain the result by Yamamoto [223]:
[
R(∆, P ∗ ) = b : R ≥ IP ∗ ,Q ∗ (X, Y ∧ X 1 , Y 2 ),

(R, R) P
QP ∗ ∈Q(P ∗ ,∆)
b ≥ IP ∗ ,Q ∗ (X, Y ∧ Y 2 ) .

R P
6.3 Reliability Criterion in Hierarchical Source Coding

and in Successive Refinement
In this section, we discuss the hierarchical source coding problem and
the relevant concept of successive refinement of information subject to
reliability criterion.
The problem treatment was originally published in [120] and later

developed in an addendum [141].
During the proofs below we shall demonstrate the conditions nec-
essary and sufficient for the successive refinement under the reliability
constraint. We reproduce the characterization of the rate-reliability-
distortion region for the hierarchical source coding and the derivation
of the conditions for successive refinability subject to reliability crite-
rion according to [141].
The concept of source divisibility was introduced by Koshelev
[155, 156, 157, 158] as an optimality criterion for source coding in hier-
archical systems. The same notion as successive refinement of infor-
mation was independently redefined by Equitz and Cover in [70].
They questioned the conditions under which it is possible to achieve
the optimum performance bounds at each level of successively more
precise transmission. In terms of rate-distortion limits the problem
statement has the following description. Assume that we transmit
information to two users, the requirement of the first on distortion
is no larger than ∆1 and demand of the second user is more accu-
rate: ∆2 ≤ ∆1 . The value R(∆1 , P ) of the rate-distortion function for
a source distributed according to the probability law P is the mini-
mal satisfactory transmission rate at the first destination. Adding an
information of a rate R0 addressed to the second user the fidelity
can be made more precise providing distortion no larger than ∆2 .
It is interesting to know when it is possible to guarantee the equal-
ity R(∆1 , P ) + R0 = R(∆2 , P ). The answer to this question is given in
Koshelev’s papers [155, 156, 157, 158], and in [70] by Equitz and Cover.
Koshelev argued that the Markovity condition for the RVs character-
izing the system is sufficient to achieve the rate-distortion limits. Later
on the same necessary condition was also established in [70], where the
authors exploited Ahlswede’s result [7] on multiple descriptions with-
out excess rate. We quote their result in Theorem 6.7. Another proof of
that result by means of characterization of the rate-distortion region for
the described hierarchical source coding situation is given in Rimoldi’s
paper [187].
In [120], treating the above notion of successive refinement of
information an additional criterion to the quality of information
reconstruction — the reliability is introduced. The extension of

the rate-distortion case to the rate-reliability-distortion one is the
assumption that the required performance limit is the rate-reliability-
distortion function. So, messages of the source must be coded for trans-
mission to the receiver with a distortion not exceeding ∆1 within the
reliability E1 , and then, using an auxiliary information (let again at
a coding rate R0 ), restored with a more precise distortion ∆2 ≤ ∆1
within the reliability E2 . Naturally but not necessarily E2 ≥ E1 . Under
the successive refinement of information (or divisibility of the source
with a PD P ) with reliability requirement, from (E1 , ∆1 ) to (E2 , ∆2 ),
the condition R(E2 , ∆2 , P ) = R(E1 , ∆1 , P ) + R0 is assumed.
The characterization of rate-distortion region for this hierarchical
transmission (also called the scalable source coding) has been indepen-
dently obtained by Koshelev [156] and Rimoldi [187]. It is derived also
by Maroutian [170] as a corollary from the characterization of the rate-
reliability-distortion region for the same system. Later on, the error
exponents in the scalable source coding were studied in the paper [151]
by Kanlis and Narayan.
In [206] Tuncel and Rose pointed out the difference between their
result and the necessary and sufficient conditions for successive refin-
ability under error exponent criterion previously derived and reported
in [120]. In the paper [141] the author revises [120] and the region [170]
of attainable rates subject to reliability criterion which was the basis
for investigation [120]. Amending Maroutian’s constructions and proofs
he restates the conditions by Tuncel and Rose [206].
First, we discuss the hierarchical source coding problem and the
results on the achievable rates region.
Let again P ∗ = {P ∗ (x), x ∈ X } be the PD of messages x of DMS X
of finite alphabet X . And let the reproduction alphabets of two receivers
be the finite sets X 1 and X 2 accordingly, with the corresponding
single-letter distortion measures dk : X × X k → [0; ∞), k = 1, 2. The
distortions dk (x, xk ), between the source N -length message x and its
reproduced versions xk are defined according to (5.4).
A code (f, g) = (f1 , f2 , g1 , g2 ) for the system (Figure 6.8) consists of
two encoders (as mappings of the source N -length messages space X N
Fig. 6.8 The hierarchical communication system.
into certain numerated finite sets {1, 2, . . . , Lk (N )}):
fk : X N → {1, 2, . . . , Lk (N )}, k = 1, 2,
and two decoders acting as converse mappings into the reproduction

N -dimensional spaces X 1N and X 2N in the following ways:
g1 : {1, 2, . . . , L1 (N )} → X 1N ,
g2 : {1, 2, . . . , L1 (N )} × {1, 2, . . . , L2 (N )} → X 2N ,
where in g2 we deal with the Cartesian product of the two sets.

Let the requirement of the first user on the averaged distor-
tion be ∆1 ≥ 0 and of the second one be ∆2 ≥ 0. Thus the sets
defined by
M
A1 = {x ∈ X N : g1 (f1 (x)) = x1 , d1 (x, x1 ) ≤ ∆1 },
M
A2 = {x ∈ X N : g2 (f1 (x), f2 (x)) = x2 , d2 (x, x2 ) ≤ ∆2 }.
will abbreviate the expressions which determine the probability of error

(caused by an applied code (f, g)) at the output of each decoder:
M
ek (f, g, P ∗ , ∆k , N ) = 1 − P ∗ (Ak ), k = 1, 2.
We say that the nonnegative numbers (R1 , R2 ) make an

(E1 , E2 , ∆1 , ∆2 )-achievable pair of coding rates if for every > 0, δ > 0,
and sufficiently large N there exists a code (f, g), such that
1
log Lk (N ) ≤ Rk + ,
N
ek (f, g, ∆k , N ) ≤ exp{−N (Ek − δ)}, k = 1, 2.
Denote the set of (E1 , E2 , ∆1 , ∆2 )-achievable rates for the system

by RP ∗ (E1 , E2 , ∆1 , ∆2 ).
For a given quartet (E1 , E2 , ∆1 , ∆2 ) and a PD P ∈ α(E1 , P ∗ ) let
Q(P, E1 , E2 , ∆1 , ∆2 ) be the set of conditional PDs QP (x1 , x2 |x) for
which the expectations on the distortions hold
M
X
EP,QP dk (X, X k ) = P (x)QP (xk | x)d(x, xk ) ≤ ∆k , k = 1, 2,
x, xk
and only
EP,QP d2 (X, X 2 ) ≤ ∆2
if P belongs to the difference of the sets α(E2 , P ∗ ) and α(E1 , P ∗ ), i.e.,

P ∈ α(E2 , P ∗ ) − α(E1 , P ∗ ),
iff E2 ≥ E1 .
And, respectively, in case of E1 ≥ E2 , under the notation
Q(P, E1 , E2 , ∆1 , ∆2 ) we mean the set of conditional PDs QP for which
EP,QP dk (X, X k ) ≤ ∆k , k = 1, 2,
when P ∈ α(E2 , P ∗ ), and only
EP,QP d1 (X, X 1 ) ≤ ∆1
if P ∈ α(E1 , P ∗ ) − α(E2 , P ∗ ).
For E1 = E2 = E we use the notation Q(P, E, ∆1 , ∆2 ) instead of
Q(P, E1 , E2 , ∆1 , ∆2 ), but when E → 0, α(E, P ∗ ) contains only the PD
P ∗ , hence the notation Q(P ∗ , ∆1 , ∆2 ) will replace Q(P, E, ∆1 , ∆2 ) one.
Each of the pairs (E1 , ∆1 ) and (E2 , ∆2 ) in the considered hier-
archical coding configuration of Figure 6.8 determines the corre-
sponding rate-reliability-distortion and the rate-distortion functions
R(Ek , ∆k , P ∗ ) and R(∆k , P ∗ ), k = 1, 2, respectively.
An attempt to find the entire (E1 , E2 , ∆1 , ∆2 )-achievable rates
region RP ∗ (E1 , E2 , ∆1 , ∆2 ) is made in Maroutian’s paper [170]. How-
ever, a revision [141] of his result stimulated by [206] brought the author
to a different region, although, in the particular case of E1 , E2 → 0, as
a consequence, Maroutian has obtained the answer for the hierarchi-
cal source coding problem under fidelity criterion [157], which was also
studied by Rimoldi [187]. A careful treatment of the code construction

strategy based on the type covering technique (by the way, used also
by Tuncel and Rose in [206]) and the combinatorial method for the
converse employed in [170], help us in obtaining the results in Theo-
rems 6.3 and 6.4, which together unify the refined result for the multi-
ple descriptions [170] (or, in other terminology, for the hierarchical or
scalable source coding).
Theorem 6.3. For E1 , E2 with E1 ≥ E2 > 0, and ∆1 ≥ 0, ∆2 ≥ 0,

RP ∗ (E1 , E2 , ∆1 , ∆2 )
\ [
= (R1 , R2 ) :
P ∈α(E2 ,P ∗ ) QP ∈Q(P,E1 ,E2 ,∆1 ,∆2 )

1
R1 ≥ max max R(∆1 , P ), IP,QP (X ∧ X ) ,
P ∈α(E1 ,P ∗ )−α(E2 ,P ∗ )

R1 + R2 ≥ IP,QP (X ∧ X 1 , X 2 ) . (6.10)
Or, equivalently,
RP ∗ (E1 , E2 , ∆1 , ∆2 )
\ [
= (R1 , R2 ) :
P ∈α(E2 ,P ∗ ) QP ∈Q(P,E1 ,E2 ,∆1 ,∆2 )
R1 ≥ max R(E1 , ∆1 , P ∗ ), IP,QP (X ∧ X 1 ) ,

1 2
R1 + R2 ≥ IP,QP (X ∧ X , X ) . (6.11)
Note that the equivalency of (6.10) and (6.11) is due to the fact
that R(∆1 , P ) ≤ IP,QP (X ∧ X 1 ) for each P ∈ α(E2 , P ∗ ) and the repre-
sentation (5.16).
Theorem 6.4. In case of E2 ≥ E1 ,

RP ∗ (E1 , E2 , ∆1 , ∆2 )
\ [
= (R1 , R2 ) :
P ∈α(E1 ,P ∗ ) QP ∈Q(P,E1 ,E2 ,∆1 ,∆2 )
R1 ≥ IP,QP (X ∧ X 1 ),

1 2
R1 + R2 ≥ max max R(∆2 , P ), IP,QP (X ∧ X , X ) .
P ∈α(E2 ,P ∗ )−α(E1 ,P ∗ )
(6.12)
Or, equivalently,
RP ∗ (E1 , E2 , ∆1 , ∆2 )
\ [ n
= (R1 , R2 ) :
P ∈α(E1 ,P ∗ ) QP ∈Q(P,E1 ,E2 ,∆1 ,∆2 )
R1 ≥ IP,QP (X ∧ X 1 ),
o
R1 + R2 ≥ max(R(E2 , ∆2 , P ∗ ), IP,QP (X ∧ X 1 , X 2 )) . (6.13)
In this case, the equivalency of (6.12) and (6.13) is due to

R(∆2 , P ) ≤ IP,QP (X ∧ X 1 X 2 ) for each P ∈ α(E1 , P ∗ ) and the repre-
sentation (5.16).
Comparing these regions with the ones (specialized for the
corresponding cases) derived in [206] one can conclude that the
regions by Tuncel and Rose formulated in terms of the scalable
rate-reliability-distortion function actually are alternative forms of
RP ∗ (E1 , E2 , ∆1 , ∆2 ) characterized in Theorems 6.3 and 6.4.
In case of equal requirements of the receivers on the reliability, i.e.,
E1 = E2 = E, we get from (6.10) and (6.12) a simpler region, denoted
here by RP ∗ (E, ∆1 , ∆2 ).
Theorem 6.5. For every E > 0 and ∆1 ≥ 0, ∆2 ≥ 0,

RP ∗ (E, ∆1 , ∆2 )
\ [ n
= (R1 , R2 ) :
P ∈α(E,P ∗ ) QP ∈Q(P,E,∆1 ,∆2 )
o
R1 ≥ IP,QP (X ∧ X 1 ), R1 + R2 ≥ IP,QP (X ∧ X 1 , X 2 ) . (6.14)
Furthermore, with E → 0 the definition of the set α(E, P ∗ ) and

(6.14) yield Koshelev’s [156] result for the hierarchical source coding
rate-distortion region, which independently appeared then also in [170]
and [187].
Theorem 6.6. For every ∆1 ≥ 0, ∆2 ≥ 0, the rate-distortion region

for the scalable source coding can be expressed as follows:
RP ∗ (∆1 , ∆2 )
[ n
= (R1 , R2 ) : R1 ≥ IP ∗ ,QP ∗ (X ∧ X 1 ),
QP ∗ ∈Q(P ∗ ,∆1 ,∆2 )
o
R1 + R2 ≥ IP ∗ ,QP ∗ (X ∧ X 1 X 2 ) . (6.15)
As in [120], we define the notion of the successive refinability in

terms of the rate-reliability-distortion function in the following way.
Definition 6.1. The DMS X with PD P ∗ is said to be successively

refinable from (E1 , ∆1 ) to (E2 , ∆2 ) if the optimal rates pair
(R(E1 , ∆1 , P ∗ ), R(E2 , ∆2 , P ∗ ) − R(E1 , ∆1 , P ∗ )), (6.16)
is (E1 , E2 , ∆1 , ∆2 )-achievable, provided that R(E2 , ∆2 , P ∗ ) ≥

R(E1 , ∆1 , P ∗ ).
It is obvious that with Ek → 0, k = 1, 2, we have the definition of the

successive refinement in distortion sense [70, 157]. Another interesting
special case is ∆1 = ∆2 = 0, then we deal with the successive refinement
in “purely” reliability sense, namely the achievability of the optimal
rates
(R(E1 , P ∗ ), R(E2 , P ∗ ) − R(E1 , P ∗ ))
related to the corresponding rate-reliability functions (5.17) for

E2 ≥ E1 (since only this condition ensures the inequality R(E2 , P ∗ ) ≥
R(E1 , P ∗ )).
Below we prove the conditions [141] for the successive refinement of
information with respect to the above definition. The different condi-
tions for two cases and their proofs employ Theorems 6.3 and 6.4 on
hierarchical source coding.
E1 ≥ E2 case: For this situation, from (6.11) it follows that the rates
pair (6.16) is achievable iff for each P ∈ α(E2 , P ∗ ) there exists a QP ∈
Q(P, E1 , E2 , ∆1 , ∆2 ) such that the inequalities
R(E1 , ∆1 , P ∗ ) ≥ max(R(E1 , ∆1 , P ∗ ), IP,QP (X ∧ X 1 )), (6.17)

∗ 1 2
R(E2 , ∆2 , P ) ≥ IP,QP (X ∧ X , X ) (6.18)
hold simultaneously. These inequalities are satisfied for each P ∈

α(E2 , P ∗ ) iff
R(E1 , ∆1 , P ∗ ) ≥ IP,QP (X ∧ X 1 ), (6.19)
which is due to (6.17), and, meanwhile
R(E2 , ∆2 , P ∗ ) ≥ IP,QP (X ∧ X 1 , X 2 )
≥ IP,QP (X ∧ X 2 ) ≥ R(∆2 , P ∗ ) (6.20)
for (6.18).
By the corollary (5.16) for the rate-reliability-distortion function it
follows that (6.19) and (6.20) hold for each P ∈ α(E2 , P ∗ ) iff there exist
a PD P̄ ∈ α(E2 , P ∗ ) and a conditional PD QP̄ ∈ Q(P̄ , E1 , E2 , ∆1 , ∆2 ),
such that X → X 2 → X 1 forms a Markov chain in that order and at
the same time
R(E1 , ∆1 , P ∗ ) ≥ IP̄ ,QP̄ (X ∧ X 1 ), (6.21)

∗ 2
R(E2 , ∆2 , P ) = IP̄ ,QP̄ (X ∧ X ). (6.22)
So, in case of E1 ≥ E2 we get the conditions for the successive refin-

ability under the reliability constraint defined by (6.21) and (6.22).
E2 ≥ E1 case: Taking into account the corresponding rate-reliability-
distortion region (6.13) it follows that (R(E1 , ∆1 , P ∗ ), R(E2 , ∆2 , P ∗ ) −
R(E1 , ∆1 , P ∗ )) is achievable iff for each P ∈ α(E1 , P ∗ ) there exists a
QP ∈ Q(P, E1 , E2 , ∆1 , ∆2 ) such that
R(E1 , ∆1 , P ∗ ) ≥ IP,QP (X ∧ X 1 ) (6.23)
and
R(E2 , ∆2 , P ∗ ) ≥ max R(E2 , ∆2 , P ∗ ), IP,QP (X ∧ X 1 , X 2 ) .

(6.24)
For each P ∈ α(E1 , P ∗ ), selecting Q̄P as the conditional PD

that minimizes the mutual information IP,QP (X ∧ X 1 , X 2 ) among
those which satisfy (6.23) and (6.24), the optimal pair of rates
(R(E1 , ∆1 , P ∗ ), R(E2 , ∆2 , P ∗ ) − R(E1 , ∆1 , P ∗ )) will be achievable iff
R(E1 , ∆1 , P ∗ ) ≥ IP,Q̄P (X ∧ X 1 ) (6.25)
and
R(E2 , ∆2 , P ∗ ) ≥ max R(E2 , ∆2 , P ∗ ), IP,Q̄P (X ∧ X 1 , X 2 ) .

(6.26)
Since the inequalities have to be satisfied for each P from α(E1 , P ∗ ),

(6.25) and (6.26) are equivalent to
R(E1 , ∆1 , P ∗ ) ≥ max IP,Q̄P (X ∧ X 1 ) (6.27)

P ∈α(E1 ,P ∗ )
and
R(E2 , ∆2 , P ∗ ) ≥ max IP,Q̄P (X ∧ X 1 , X 2 ). (6.28)

P ∈α(E1 ,P ∗ )
Then, recalling (5.16) again, the inequalities (6.27) and (6.28) in turn
hold for each P ∈ α(E1 , P ∗ ) iff
R(E1 , ∆1 , P ∗ ) = max IP,Q̄P (X ∧ X 1 ) (6.29)

P ∈α(E1 ,P ∗ )
and meantime
R(E2 , ∆2 , P ∗ ) ≥ max IP,Q̄P (X ∧ X 1 , X 2 ). (6.30)

P ∈α(E1 ,P ∗ )
Now, noting that the right-hand side of the last inequality does not
depend on E2 and the function R(E2 , ∆2 , P ∗ ) is monotonically nonde-
creasing in E2 , we arrive to the conclusion that (6.30) will be satisfied
for Q̄P meeting (6.29) iff E2 ≥ Ê2 , where
R(Ê2 , ∆2 , P ∗ ) = max IP,Q̄P (X ∧ X 1 , X 2 ). (6.31)

P ∈α(E1 ,P ∗ )
It must be noted also that the successive refinement in reliability

sense in case of E2 ≥ E1 is not possible if
max IP,Q̄P (X ∧ X 1 , X 2 ) > max R(∆2 , P ),

P ∈α(E1 ,P ∗ ) P
where the right-hand side expression is the value of the zero-error rate-
distortion function (5.19) for the second hierarchy.
Finally note that we obtain the conditions for the successive refin-
ability in distortion sense [70, 157], and [187], letting E1 = E2 = E → 0,
as in (6.21) and (6.22). We quote those particularized conditions accord-
ing to the theorem by Equitz and Cover [70].
Theorem 6.7. For the DMS with distribution P ∗ and distortion

measure d1 = d2 = d, the pair (R(∆1 , P ∗ ),R(∆2 , P ∗ ) − R(∆1 , P ∗ )) is
achievable iff there exists a conditional PD Q, such that
R(∆1 , P ∗ ) = IP ∗ ,Q (X ∧ X 1 ), EP ∗ ,Q d(X, X 1 ) ≤ ∆1 , (6.32)

∗ 2 2
R(∆2 , P ) = IP ∗ ,Q (X ∧ X ), EP ∗ ,Q d(X, X ) ≤ ∆2 , (6.33)
and RVs X, X 2 , X 1 form a Markov chain in that order.
Now resuming the discussions in this section we may conclude that

the successive refinement of information in distortion sense is possible
if and only if the Markovity condition in Theorem 6.7 with (6.32) and
(6.33) for the source are fulfilled. In the more natural case of E2 ≥ E1 ,
the successive refinement of information under the reliability criterion
is possible if and only if E2 is larger than the threshold defined by
(6.31).
Meanwhile, it would be interesting to note that the successive refine-
ment in the “purely” reliability sense, i.e., when ∆1 = ∆2 = 0, is always
possible for E2 ≥ E1 , since in that case Theorem 6.4 yields the achiev-
ability of the rates
R1 = max HP (X),
P ∈α(E1 ,P ∗ )
R1 + R2 = max HP (X),
P ∈α(E2 ,P ∗ )
which are the corresponding values of the rate-reliability function

(5.17).
7
Logarithmically Asymptotically Optimal Testing
of Statistical Hypotheses
7.1 Prelude
The section serves an illustration of usefulness of combinatorial meth-
ods developed in information theory to investigation of the logarith-
mically asymptotically optimal (LAO) testing of statistical hypotheses
(see also [49, 51, 54, 167]).
Applications of information-theoretical methods in mathematical
statistics are reflected in the monographs by Kullback [160], Csiszár
and Körner [51], Cover and Thomas [48], Blahut [35], Han [94], Ihara
[147], Chen and Alajaji [43], Csiszár and Shields [54]. Verdú’s book [216]
on multi-user detection is an important information theory reference
where the hypothesis testing is in service to information theory in its
practical elaborations.
The paper by Dobrushin, Pinsker and Shiryaev [59] presents a series
of prospective problems on this subject. Blahut [34] used the results
of statistical hypotheses testing for solution of information-theoretical
problems.
For series of independent experiments the well-known Stein’s
lemma [51] shows that for the given fixed first kind error probability
(N )
α1 = α1 the exponential rate of convergence to zero of the probability
129
130 Logarithmically Asymptotically Optimal Testing of Statistical Hypotheses
(N )
of the second kind error α2 when the number of experiments tends
to infinity is as follows:
(N )
lim N −1 log α2 (α1 ) = −D(P1 kP2 ),
N →∞
where D(P1 kP2 ) is informational divergence (see Section 1.2) or, as it

is also named in Statistics, distance, or Kullback-Leibler information
of hypothetical distributions P1 and P2 defined on the finite set X .
In Section 7.2, we study functional dependence of the first and the
second kind error probabilities of the optimal tests for the sequence of
experiments concerning a Markov chain, and in Section 7.3 the case of
many hypotheses for independent experiments will be considered.
This problem was investigated in many works, e.g., Hoeffding [145],
Csiszár and Longo [53], Birgé [32], Blahut [34], Han [92, 94], Haroutu-
nian [134], Natarajan [177], Perez [179], Tusnády [207], [208], Fu and
Shen [77], Tuncel [205], and analyzed in some of noted books.
Various associated problems, notions, and results are presented by
Berger[28], Ahlswede [10], Ahlswede and Wegener [21], Ahlswede and
Csiszár [16], Anantharam [23], Bechhofer et al. [24], Burnashev et al.
[37], Gutman [87], Han and Amari [95], Han and Kobayashi [97], Lya-
punov [168], Lin’kov [163, 164, 165], Chen [42], Poor and Verdú [184],
Puhalskii and Spokoiny [186], Verdú [214], Feder and Merhav [73], Lev-
itan and Merhav [161], Zeitouni and Gutman [225], Ziv [229], Zhang
and Berger [227], and others.
The results presented in Sections 7.2 and 7.3 were published in
[103, 104, 105, 106].
New developments in this direction by Ahlswede and Haroutunian
[18, 19], Ahlswede et al. [11], Haroutunian and Hakobyan [110, 111,
112, 113] and others are briefly discussed in Section 7.4.
7.2 Reliability Function for Two Alternative Hypotheses

Concerning Markov Chains
Let x = (x0 , x1 , x2 , . . . , xN ), xn ∈ X = {1, 2, . . . , I}, x ∈ X N +1 , N =
0, 1, 2, . . . , be vectors of observed states of a simple homogeneous
stationary Markov chain with finite number I of states. There are two
competing hypotheses concerning the matrix of transition probabilities
of the chain: P1 = {P1 (j|i)} or P2 = {P2 (j|i)}, i, j = 1, I. In both cases

[60] there exist corresponding stationary distributions Q1 = {Q1 (i)}
and Q2 = {Q2 (i)}, not necessarily unique, such that
X X
Ql (i)Pl (j|i) = Ql (j), Ql (i) = 1, l = 1, 2, j = 1, I.
i i
We shall use the following definition of the probability of the vector

x ∈ X N +1 of the Markov chain with transition probabilities Pl and
stationary distribution Ql
N
M
Y
Ql ◦ PlN (x) = Ql (x0 ) Pl (xn |xn−1 ), l = 1, 2,
n=1
M
[
Ql ◦ PlN (A) = Ql ◦ PlN (x), A ⊂ X N +1 .
x∈A
Based on observed trajectory x the resolver device must adopt decision

about correctness of the first or the second hypothesis. Denote α1 (φN )
the error probability of the first kind of the test criterion φN (x). It is
the probability to reject the first hypothesis if in the reality it is correct.
Let us denote by G N the set of vectors x for which the hypothesis
P1 is adopted:
M
G N = {x : φN (x) = 1}.
Then the error probability of the first kind α1 (φN ) is

M
α1 (φN ) = 1 − Q1 ◦ P1N (G N )
and the error probability of the second kind α2 (φN ) is

M
α2 (φN ) = Q2 ◦ P2N (G N ).
A sequence of tests φN (x), N = 1, 2, . . . , is called [32] logarithmically

asymptotically optimal if for given value E1 > 0
α1 (φN ) ≤ e−N E1 ,
and the upper limit
lim −N −1 log α2 (φN )

N →∞
takes its maximal value denoted by E2 (E1 ). By analogy with the notion
introduced by Shannon into the information theory (see (2.7), (2.12))
it is natural to refer the function E2 (E1 ) also as reliability function.
Let P = {P (j|i), i = 1, I, j = 1, I} be a matrix of transition prob-
abilities of a stationary Markov chain with the same states set X
and let Q = {Q(i), i = 1, I} be corresponding stationary distribution.
Let us denote D(Q ◦ P kQl ◦ Pl ) Kullback–Leibler divergence of the
distribution
Q ◦ P = {Q(i)P (j|i), i = 1, I, j = 1, I}
from the distribution
Ql ◦ Pl = {Ql (i)Pl (j|i), i = 1, I, j = 1, I}, l = 1, 2,
where
X
D(Q ◦ P kQl ◦ Pl ) = Q(i)P (j|i)[log Q(i)P (j|i) − log Ql (i)Pl (j|i)]
i,j
= D(QkQl ) + D(Q ◦ P kQ ◦ Pl )
with
X
D(QkQl ) = Q(i)[log Q(i) − log Ql (i)], l = 1, 2.
i
The main theorem of the section is the next one.
Theorem 7.1. For any E1 > 0 the LAO tests reliability function
E2 (E1 ) for testing of two hypotheses P1 and P2 concerning Markov
chains is given by the formula:
E2 (E1 ) = inf inf D(Q ◦ P kQ ◦ P2 ). (7.1)

Q:∃Q1 ,D(QkQ1 )<∞ P :D(Q◦P kQ◦P1 )≤E1
The reliability function has the properties which are consequences

of the theorem.
(C1) If
inf D(Q1 ◦ P1 kQ1 ◦ P2 ) < ∞,

Q1
then
E2 (E1 ) ≤ lim E2 (E1 ) = inf D(Q1 ◦ P1 kQ1 ◦ P2 ).

E1 →0 Q1
(C2) E2 (E1 ) monotonically decreases by E1 for

E1 > inf inf D(Q ◦ P kQ ◦ P1 ),
Q P :D(Q◦P kQ◦P2 )<∞
and for smaller E1 we have E2 (E1 ) = ∞.

(C3) If E1 > inf D(Q2 ◦ P2 kQ2 ◦ P1 ), then E2 (E1 ) = 0. But if
Q2
inf D(Q2 ◦ P2 kQ2 ◦ P1 ) = ∞,

Q2
then E2 (E1 ) > 0 for E1 arbitrarily large.

(C4) In particular case, when P1 and P2 have only positive compo-
nents, stationary distributions Q1 and Q2 are unique and have strictly
positive components, then the function E2 (E1 ) takes the form:
E2 (E1 ) = min D(Q ◦ P kQ ◦ P2 ),
P :D(Q◦P kQ◦P1 )≤E1
moreover, matrix P has also positive components and Q is its corre-

sponding stationary distribution. This is a result of Natarajan [177]
obtained by application of large deviations technique.
(C5) In case of independent experiments with values on X for two
possible distributions P1 and P2 , and PD of a sample P we obtain the
following formula
E2 (E1 ) = min D(P kP2 ).
P :D(P kP1 )≤E1
Proof. The proof of Theorem 7.1 consists of two parts. In the first part
by means of construction of a test sequence it is proved that E2 (E1 )
is not less than the expression noted in (7.1). In the second part it is
proved that E2 (E1 ) cannot be greater than this expression. Let us name
the second-order type of vector x (cf. [87, 148]) the square matrix of I 2
relative frequencies {N (i, j)N −1 , i = 1, I, j = 1, I} of the simultaneous
appearance on the pairs of neighbor places of the states i and j. It is
N the set of vectors from X N +1
P
clear that ij N (i, j) = N . Denote TQ◦P
which have the type such that for some joint PD Q ◦ P
N (i, j) = N Q(i)P (j|i), i = 1, I, j = 1, I.
N , then
Note that if the vector x ∈ TQ◦P
X
N (i, j) = N Q(i), i = 1, I,
j
X
N (i, j) = N Q0 (j), j = 1, I,
i
for somewhat different PD Q0 , but in accordance with the definition of

N (i, j) we have
|N Q(i) − N Q0 (i)| ≤ 1, i = 1, I,
and then in the limit, when N → ∞, the distribution Q coincides with

Q0 and may be taken as stationary for conditional PD P :
X
Q(i)P (j|i) = Q(j), j ∈ X .
i
The test φN (x) can be given by the set B(E1 ), the part of the space
X N +1 ,in which the hypothesis P1 is adopted. We shall verify that the
test for which
[
N
B(E1 ) = TQ◦P (7.2)
Q,P :D(Q◦P kQ◦P1 )≤E1 ,∃Q1 :D(QkQ1 )≤∞
will be asymptotically optimal for given E1 .

Note that for l = 1, 2 the probability of x from TQ◦P N can be writ-
ten as
Y
Ql ◦ PlN (x) = Ql (x0 ) Pl (j|i)N Q(i)P (j|i) .
i,j
From here if Ql ◦ PlN (x) > 0, then PD Q ◦ P is absolutely continuous

relative to the PD Ql ◦ Pl . On the contrary, if Ql ◦ PlN (x) = 0, then the
distribution Q ◦ P is not absolutely continuous relative to the measure
Ql ◦ Pl and in this case
D(Q ◦ P kQl ◦ Pl ) = ∞,
but since
D(Q ◦ P kQl ◦ Pl ) = D(Q ◦ P kQ ◦ Pl ) + D(QkQl ),

then at least one of the summands is infinite. Note also that if Q ◦ P

is absolutely continuous relative to Ql ◦ Pl , then
Ql ◦ PlN (TQ◦P
N
) = exp{−N (D(Q ◦ P kQ ◦ Pl ) + o(1))},
where

−1
o(1) = max max |N log Ql (i)| : Ql (i) > 0 ,
i

−1
max |N log Ql (i)| : Ql (i) > 0 → 0, when N → ∞.
i
Really, this is not difficult to verify taking into account that the number
N | of vectors in T N
|TQ◦P Q◦P is equal to
 
 X 
exp −N Q(i)P (j|i) log P (j|i) + o(1) .
 
i,j
Now our goal is to determine
α1 (φN ) = max Q1 ◦ P1N (B(E1 )).

Q1
N
By analogy with (1.2) the number of different types TQ◦P does not
2
exceed (N + 1)|X | , then
α1 (φN ) ≤ exp{−N (E1 + o(1))}.
At the same time it is not difficult to verify that
α2 (φN ) = max Q2 ◦ P2N (B(E1 ))

Q2

≤ exp − N min
Q,P :D(Q◦P kQ◦P1 )≤E1 ,∀Q1 :D(QkQ1 )<∞

× D(Q ◦ P kQ ◦ P2 ) + o(1) .
Therefore the proposed test possesses the necessary exponents of error

probabilities.
It is easy to see that the theorem is valid also for E1 = ∞, that is
for α1 (φN ) = 0.
Coming to the second part of the proof, note that for any x ∈ TQ◦PN
the probability Ql ◦ PlN (x) is constant. Hence for the optimal test the
corresponding set B 0 (E1 ) contains only the whole types TQ◦P
N . Now we
can conclude that logarithmically asymptotically optimal is only the

proposed sequence of tests, defined by the sets B(E1 ) .
Remark 7.1. The test given by the set B(E1 ) is robust because it is
the same for different alternative hypothesis P2 .
7.3 Multiple Hypotheses: Interdependence of Reliabilities

In this section we shall generalize the results of the previous section for
the case of L > 2 hypotheses.
Let X = {1, 2, . . . , I} again be a finite set of states of the stationary
Markov chain. The hypotheses concern to the matrices of the transition
probabilities Pl = {Pl (j|i) i = 1, I, j = 1, I}, l = 1, L. The stationarity
of the chain provides existence for each l = 1, L of the stationary dis-
tribution Ql = {Ql (i), i = 1, I}, not necessarily unique. On the base of
the trajectory x = (x0 , x1 , . . . , xN ) of the N + 1 observations the test
accepts one of the hypotheses Hl , l = 1, L.
(N )
Let us denote αl|r (φN ) the probability to accept the hypothesis Hl
in the condition that the Hr , r 6= l, is true.
(N )
For l = r we denote αr|r (φN ) the probability to reject the hypoth-
esis Hr . It is clear that
(N )
X (N )
αr|r (φN ) = αl|r (φN ), r = 1, L. (7.3)
l6=r
This probability is called [36] the error probability of the rth kind of
the test φN . The quadratic matrix of L2 error probabilities A(φN ) =
(N )
{αl|r (φ), r = 1, L, l = 1, L} sometimes is called the power of the tests.
To every trajectory x the determined test φN results in a choice of
a hypothesis among L ones. So the space X N +1 will be divided into
L parts
GlN = {x, φN (x) = l}, l = 1, L,

and
αl|r (φN ) = Qr ◦ Pr (Gl ), r, l = 1, L.
Denote
1
El|r (φ) = lim − log αl|r (φN ), r, l = 1, L. (7.4)
N →∞ N
The matrix E = {El|r r = 1, L, l = 1, L} we call the reliability matrix of

the tests’ sequence φ. The problem formulated by Prof. R. Dobrushin
during a seminar in the Institute of Information Transmission Prob-
lems of the USSR Academy of Sciences in March of 1987 consists in
determination of the best in some sense reliability matrix E, which may
be received for L known distributions.
Note that from definitions (7.3) and (7.4) it follows that
Er|r = min El|r . (7.5)

l6=r
In case L = 2 there are only 2 parameters in matrix E, because
E1|1 = E2|1 , E1|2 = E2|2 ,
so the problem lies in the determination of the maximum value of one

of them (say E1|2 ) as a function of the given value of the other (E1|1 ).
Let us name the sequence of tests LAO if for given family of
E1|1 , E2|2 , . . . , EL−1|L−1 , these numbers make the diagonal of the relia-
bility matrix and the remaining L2 − L + 1 components of it take the
possible maximum values.
Let P = {P (j|i)} be a matrix of transition probabilities of some
stationary Markov chain with the same set X of states, and Q = {Q(i),
i = 1, I} be the corresponding stationary PD. Let us define the sets
M
Rl = {Q ◦ P : D(Q ◦ P kQ ◦ Pl ) ≤ El|l , ∃Ql : D(QkQl ) < ∞},
l = 1, L − 1,
M
RL = {Q ◦ P : D(Q ◦ P kQ ◦ Pl ) ≥ El|l , l = 1, L − 1},
and introduce the functions:

∗ M
El|l (El|l ) = El|l , l = 1, L − 1,
∗
El|r (El|l ) = inf D(Q ◦ P kQ ◦ Pr ), r = 1, L, l 6= r, l = 1, L − 1,
Q◦P ∈Rl
∗ M
EL|r (E1|1 , . . . , EL−1|L−1 ) = inf D(Q ◦ P kQ ◦ Pr ), r = 1, L − 1,
Q◦P ∈RL
∗ ∗ M
EL|L (E1|1 , . . . , EL−1|L−1 ) = min El|L .
l=1,L−1
The minimum on the void set always will be taken equal to infinity.
Name the following conditions as compatibility conditions:
0 < E1|1 < min[inf D(Qr ◦ Pr kQr ◦ P1 ), r = 2, L],

Qr
· · · · · · · · · ·
h
∗
0 < El|l < min Er|l (Er|r ), r = 1, l − 1,
i
inf D(Qr ◦ Pr kQr ◦ Pl ), r = l + 1, L , l = 2, L − 1,
Qr
Theorem 7.2. (1) If the family of finite positive numbers

E1|1 , . . . , EL−1|L−1 verify compatibility conditions, then there exists
LAO sequence of tests, the reliability matrix of which is defined
by the function El|r ∗ , and all elements of this matrix are strictly
positive.
(2) If the compatibility conditions are violated, then for any tests
at least one element of the reliability matrix will be equal to zero, that
is the corresponding error probabilities will not decrease exponentially.
The proof of this theorem is a simple combination of proofs of The-

orems 7.1 and 7.3 concerning L hypotheses for i.i.d experiments. Dif-
ferences in notations are the following: L PDs P1 , . . . , PL are given, the
trajectory of observed results of N experiments is x = (x1 , . . . , xN ) and
its probability is
N
Y
PrN (x) = Pr (xn ), r = 1, L.
n=1
Now let us redefine several notions and notations, manipulated

throughout, somewhat different from the ones of Section 1.4. We call
the type Px of a sample x (or sample PD) the vector
(N (1|x)/N, . . . , N (I|x)/N ),
where N (i|x) is the number of repetitions of the state i in x. The set

of vectors x of given type P will be denoted as TPN . P (N ) denotes the
set of all possible types of samples of length N .
Let us introduce for given positive and finite numbers
E1|1 , . . . , EL−1|L−1 the following notations:
Rl = {P : D(P kPl ) ≤ El|l }, l = 1, L − 1,

RL = {P : D(P kPL ) > El|l , l = 1, L − 1}, (7.6)
(N )
\
Rl = Rl P (N ) , l = 1, L,
∗ ∗
El|l = El|l (El|l ) = El|l , l = 1, L − 1,
∗ ∗
El|r = El|r (El|l ) = inf D(P kPr ), r = 1, L, r 6= l, l = 1, L − 1,
P ∈Rl
∗ ∗
EL|r = EL|r (E1|1 , . . . , EL−1|L−1 ) = inf D(P kPr ), r = 1, L − 1,
P ∈RL
∗ ∗ ∗
EL|L = EL|L (E1|1 , . . . , EL−1|L−1 ) = min El|L (7.7)
l=1,L−1
∗ may be equal to infinity, this may occur,
Note that parameter Er|l
when some measures Pl are not absolutely continuous relative to some
others.
Theorem 7.2 admits the following form.
Theorem 7.3. Let the family of various PDs P1 , . . . , PL be given on

the finite set X . For L − 1 positive finite numbers E1|1 , . . . , EL−1|L−1
strict inequalities fulfilment
0 < E1|1 < min D(Pl kP1 ),

l=2,L
· · · · · · · · · · · · · · · (7.8)

∗
0 < Er|r < min min El|r (El|l ), min D(Pl kPr ) , r = 2, L − 1,
l=1,r−1 l=r+1,L
is necessary and sufficient for the existence of LAO sequence of tests

with the reliability matrix E ∗ = (El|r
∗ ), r = 1, L, l = 1, L, all components
of which are strictly positive and are determined in (7.7).
We call the family E1|1 , . . . , EL−1|L−1 , which satisfy conditions (7.8)

compatible.
Remark 7.2. When hypotheses are reindexed the theorem remains

valid with corresponding changes in conditions (7.8).
Remark 7.3. The maximum likelihood test adopts the hypoth-

esis, which maximizes the probability of sample x, that is r∗ =
(N )
arg max Pr (x). Meanwhile simultaneously
r
r∗ = arg min D(Px kPr ),

r
that is the maximum likelihood principle is equivalent to the minimum

divergence principle.
The rest of the section is devoted to an overview of works on hypoth-

esis testing for sources with side information.
In the paper [111] Haroutunian and Hakobyan studied the matrix
of asymptotic interdependencies (reliability–reliability functions) of all
possible pairs of the error probability exponents (reliabilities) in test-
ing of multiple statistical hypotheses for arbitrarily varying object
with the current states sequence known to the statistician. The case
of two hypotheses when state sequences are not known to the deci-
sion maker was studied by Fu and Shen [77], and when decision
is found on the base of known states sequence was considered by
Ahlswede et al. [11]. In the same way as Fu and Shen from the main
result the rate-reliability and the reliability-rate functions for arbitrar-
ily varying source coding with side information were obtained in [111]
(c.f. Section 5.2).
7.4 Optimal Testing and Identification for Statistical Hypothesis 141
7.4 Statistical Hypothesis Optimal Testing

and Identification for Many Objects
In this section, we briefly present some new results and prob-
lems in statistics which are generalizations of those exposed in
Sections 7.1–7.3.
The identification or hypothesis testing problems are considered
for models with K(≥ 1) random objects each having one from L(≥ 2)
PDs.
Let Xk = (Xk,n , n = 1, N ), k = 1, K, be K sequences of N discrete
i.i.d. RVs representing possible results of N observations, respectively,
for each of K randomly functioning objects. For k = 1, K, n = 1, N ,
Xk,n assumes values xk,n in the finite set X of cardinality |X |. Let P(X )
be the space of all possible PDs on X . There are L(≥ 2) PDs P1 , . . . , PL
from P(X ) in inspection, some of which are assigned to the vectors
X1 , . . . , XK . This assignment is unknown and must be determined on
the base of N -samples (results of N independent observations) xk =
(xk,1 , . . . , xk,N ), where xk,n is a result of the nth observation of the kth
object.
When L = K and all objects are different (any two objects cannot
have the same PD), there are K! alternative versions of decisions. When
objects are independent, there are LK possible combinations.
Bechhofer et al. presented investigations on sequential multiple-
decision procedures in the book [24] which is concerned princi-
pally with a particular class of problems referred to as ranking
problems.
Chapter 10 of the book by Ahlswede and Wegener [21] is devoted
to the statistical identification and ranking problems.
We consider models from [21] and [24] and variations of these models
inspired by the pioneering papers by Ahlswede and Dueck [17] and by
Ahlswede [10], with application of the optimality concept developed in
Sections 7.1–7.3 for the models with K = 1.
Consider the following family of error probabilities of a test
(N )
αl1 ,l2 ,...,lK |m1 ,m2 ,...,mK , (m1 , m2 , . . . , mK ) 6= (l1 , l2 , . . . , lK ),
mk , lk = 1, L, k = 1, K,
which are the probabilities of the decision l1 , l2 , . . . , lK when

actual indices of the distributions of the objects are, respectively,
m1 , m2 , . . . , mK .
The probabilities to reject all K hypotheses when they are true are
(N ) (N )
X
αl1 ,l2 ,...,lK |l1 ,l2 ,...,lK = αl1 ,l2 ,...,lK |m1 ,m2 ,...,mK .
(m1 ,m2 ,...,mK )6=(l1 ,l2 ,...,lK )
To study the exponential decrease of the error probabilities when the

number of observations N increases we define reliabilities
1 (N )
lim − log αl1 ,l2 ,...,lK |m1 ,m2 ,...,mK = El1 ,l2 ,...,lK |m1 ,m2 ,...,mK ≥ 0.
N →∞ N
(7.9)
In the papers [18] and [19] it was shown that questions arise in
different models of statistical identification. The following models and
problems were formulated and a part of them were solved.
(1) K objects are different, they have distinct PDs among L ≥ K

possibilities. As a starting point the problem of hypotheses
LAO testing for the case K = 2, L = 2 was treated in [18]
and [19].
(2) K objects are independent, that is they may follow also iden-
tical PDs. The problem is the same. An example for K, L = 2
was considered in [18] and [19]. It is surprising, but this model
was not studied before the paper [18].
(3) For the model with one object, K = 1, and L possible PDs
the question is whether the lth distribution occurred or not.
This is the problem of identification of PDs in the spirit of
the paper [17].
(4) The ranking (or ordering) problem [10]. Having one vector of
observations X = (X1 , X2 , . . . , XN ) and L hypothetical PDs
the receiver wants to know whether the index of the true PD
of the object is in {1, 2, . . . , r} or in {r + 1, . . . , L}.
(5) r-identification of PD [10]. Again K = 1. One wants to iden-
tify the observed object as a member either of the subset S
of {1, 2, . . . , L} or of its complement, with r being the number
of elements in S.
In what follows, we survey the results related to the problems for-

mulated above.
In the papers [18] and [19] the problem concerning r-identification
(which proved to be equivalent to the ranking problem) is com-
pletely solved. The full solution of the problem of LAO iden-
tification of the PD of an object (the above noted model 3)
was in the same papers, which we are going to expose in the
sequel.
First, it is necessary to formulate our meaning of the LAO identifica-
tion problem for one object. There are known L ≥ 2 possible PDs. The
identification is the answer to the question whether rth PD occurred or
not. As in the testing problem, this answer must be given on the base
of a sample x employing a test φN (x).
There are two error probabilities for each r = 1, L: the probabil-
ity αl6=r|m=r (φN ) to accept l different from r, when r is in reality,
and the probability φl=r|m6=r (φN ) that r is accepted, when it is not
correct.
The probability αl6=r|m=r (φN ) is already known, it coincides with
P
the probability αr|r (φN ) which is equal to l:l6=r αl|r (φN ). The corre-
sponding reliability El6=r|m=r (φ) is equal to Er|r (φ) which satisfies the
equality (7.5).
And what is the reliability approach to identification? It is nec-
essary to determine the optimal dependence of El=r|m6 ∗
=r upon given
∗ ∗
El6=r|m=r = Er|r , which can be assigned a value satisfying condi-
tions (7.8).
We need to involve some a priori probabilities of different hypothe-
ses. Let us suppose that the hypotheses P1 , . . . , PL have, say, probabili-
ties Pr(r), r = 1, L. The only constraint we shall use is that Pr(r) > 0,
r = 1, L. We will see, that the result formulated in the coming theorem
does not depend on values of Pr(r), r = 1, L, if they all are strictly
positive.
Now we can make the following reasoning for each r = 1, L:
(N ) Pr(N ) (m 6= r, l = r) 1 X (N )
αl=r|m6=r = = P αr|m Pr(m).
Pr(m 6= r) Pr(m)
m:m6=r
m:m6=r
From here one can observe that for r = 1, L.

1 (N )
El=r|m6=r = lim − log αl=r|m6=r
N →∞ N
 
1  X X (N )
= lim log Pr(m) − log αr|m Pr(m)
N →∞ N
m:m6=r m:m6=r
∗
= min Er|m .
m:m6=r
By analogy with the consequence (C5) in Section 7.2 we can conclude

(with Rr defined as in (7.6) for each r including r = L, that for the
values of Er|r from (0, minl:l6=r D(Pl ||Pr )))
El=r|m6=r (Er|r ) = min inf D(P kPm )

m:m6=r P ∈Rr
= min inf D(P kPm ), r = 1, L. (7.10)

m:m6=r P :D(P kPr )≤Er|r
This outcome is summarized in
Theorem 7.4. For the model with different distributions, under the
condition that the probabilities of all L hypotheses are positive the
reliability Em6=r|l=r for given Em=r|l6=r = Er|r is defined by (7.10).
For a good perception of the theory it would be pertinent to discuss

an example with the set X = {0, 1} having only 2 elements. Let the
following five PDs are given on X :
P1 = {0.10, 0.90},
P2 = {0.65, 0.35},
P3 = {0.45, 0.55},
P4 = {0.85, 0.15},
P5 = {0.23, 0.77}.
In Figure 7.1 the results of calculations of El=r|m6=r as function of

El6=r|m=r are presented.
Fig. 7.1 The function El=r|m6=r for different values of r.
The elements of the matrix of divergences of all pairs of distributions

are used for calculation of conditions (7.8) for this example.
 
0 0.956 0.422 2.018 0.082
1.278 0 0.117 0.176 0.576
 
l∈[5]
{D(Pm kPl )}m∈[5] = 0.586 0.120 0 0.618 0.169 .
 
 
2.237 0.146 0.499 0 1.249
0.103 0.531 0.151 1.383 0
In Figures 7.2 and 7.3 the results of calculations of the same depen-
dence are presented for four distributions taken from the previous five.
In [113] Haroutunian and Hakobyan solved the problem of identifi-
cation of distributions for two independent objects, what will be exposed
in the coming lines.
We begin with a lemma from [110] on LAO testing for two indepen-
dent objects and L hypotheses concerning each of them. Let a sequence
Fig. 7.2 The function El=r|m6=r for four distributions taken from five.
of compound tests Φ = (φ1 , φ2 ) consists of the pair of sequences of tests

φ1 and φ2 for respective separate objects.
Lemma 7.5. If elements El|m (φi ), m, l = 1, L, i = 1, 2, are strictly pos-

itive, then the following equalities hold
2
X
El1 ,l2 |m1 ,m2 (Φ) = Eli |mi (φi ), if m1 6= l1 , m2 6= l2 , (7.11)
i=1
i
El1 ,l2 |m1 ,m2 (Φ) = Eli |mi (φ ), if m3−i = l3−i , mi 6= li , i = 1, 2. (7.12)
The LAO test Φ∗ is the compound test, and for it the equalities
(7.11) and (7.12) are valid.
For identification the statistician have to answer the question
whether the pair of distributions (r1 , r2 ) occurred or not. Let us con-
sider two types of error probabilities for each pair (r1 , r2 ), r1 , r2 =
Fig. 7.3 The function El=r|m6=r for another four distributions.
(N )
1, L. We denote by α(l1 ,l2 )6=(r1 ,r2 )|(m1 ,m2 )=(r1 ,r2 ) the probability that
pair (r1 , r2 ) is true, but it is rejected. Note that this probability
(N ) (N )
is equal to αr1 ,r2 |r1 ,r2 . Let α(l1 ,l2 )=(r1 ,r2 )|(m1 ,m2 )6=(r1 ,r2 ) be the prob-
ability that (r1 , r2 ) is accepted, when it is not correct. The corre-
sponding reliabilities are E(l1 ,l2 )6=(r1 ,r2 )|(m1 ,m2 )=(r1 ,r2 ) = Er1 ,r2 |r1 ,r2 and
E(l1 ,l2 )=(r1 ,r2 )|(m1 ,m2 )6=(r1 ,r2 ) . Our aim is to determine the dependence of
E(l1 ,l2 )=(r1 ,r2 )|(m1 ,m2 )6=(r1 ,r2 ) on given Er1 ,r2 |r1 ,r2 (ΦN ).
Now let us suppose that hypotheses P1 , P2 , . . . , PL for two objects
have a priori positive probabilities Pr (r1 , r2 ), r1 , r2 = 1, L, and consider
the probability, which we are interested in:
(N )
α(l1 ,l2 )=(r1 ,r2 )|(m1 ,m2 )6=(r1 ,r2 )
PrN ((m1 , m2 ) 6= (r1 , r2 ), (l1 , l2 ) = (r1 , r2 ))

=
Pr((m1 , m2 ) 6= (r1 , r2 ))
P
α(m1 ,m2 )|(r1 ,r2 ) (N )Pr((m1 , m2 ))
(m1 ,m2 ):(m1 ,m2 )6=(r1 ,r2 )
= P .
Pr(m1 , m2 )
(m1 ,m2 )6=(r1 ,r2 )
Consequently, we obtain that
E(l1 ,l2 )=(r1 ,r2 )|(m1 ,m2 )6=(r1 ,r2 ) = min Er1 ,r2 |m1 ,m2 .
(m1 ,m2 ):(m1 ,m2 )6=(r1 ,r2 )
(7.13)
For every LAO tests Φ∗ from (7.9), (7.11), (7.12), and (7.14) we
obtain that
E(l1 ,l2 )=(r1 ,r2 )|(m1 ,m2 )6=(r1 ,r2 )

= min ErI1 |m1 (Er1 |r1 ), ErII2 |m2 (Er2 |r2 ) , (7.14)
m1 6=r1 ,m2 6=r2
where ErI1 |m1 (Er1 |r1 ), ErII2 |m2 (Er2 |r2 ) are determined by (7.7) for, corre-
spondingly, the first and the second objects. For every LAO test Φ∗
from (7.9), (7.11), and (7.12) we deduce that

Er1 ,r2 |r1 ,r2 = min ErI1 |m1 , ErII2 |m2 = min ErI1 |r1 , ErII2 |r2 .
m1 6=r1 ,m2 6=r2
(7.15)
and each of ErI1 |r1 , ErII2 |r2 satisfy the following conditions:

∗
0< ErI1 |r1 < min min El|m I
(El|l ), min D(Pl ||Pr1 ) , (7.16)
l=1,r1 −1 l=r1 +1,L

∗
0< ErII2 |r2 < min min El|m II
(El|l ), min D(Pl ||Pr2 ) . (7.17)
l=1,r2 −1 l=r2 +1,L
From (7.7) we see that the elements El|m ∗ (E I ), l = 1, r − 1 and

l|l 1
∗ II I II
El|m (El|l ), l = 1, r2 − 1 are determined only by El|l and El|l . But we
are considering only elements ErI1 |r1 and ErII2 |r2 . We can use Stain’s
Lemma formulated for L hypotheses and upper estimate in (7.16) and
(7.17) as follows:

0< ErI1 |r1 < min min D(Pr1 ||Pl ), min D(Pl ||Pr1 ) , (7.18)
l=1,r1 −1 l=r1 +1,L

0 < ErII2 |r2 < min min D(Pr2 ||Pl ), min D(Pl ||Pr2 ) . (7.19)
l=1,r2 −1 l=r2 +1,L
Let us denote r = max(r1 , r2 ) and k = min(r1 , r2 ). From (7.15) we

have that, when Er1 ,r2 |r1 ,r2 = ErI1 |r1 , then ErI1 |r1 ≤ ErII2 |r2 and when
Er1 ,r2 |r1 ,r2 = ErII2 |r2 , then ErII2 |r2 ≤ ErI1 |r1 . Hence, it can be implied that
given strictly positive element Er1 ,r2 |r1 ,r2 must meet both inequalities
(7.18) and (7.19), and the combination of these restrictions gives

Er1 ,r2 |r1 ,r2 < min min D(Pr ||Pl ), min D(Pl ||Pk ) . (7.20)
l=1,r−1 l=k+1,L
Using (7.15) and (7.20) we can determine reliability

E(l1 ,l2 )=(r1 ,r2 )|(m1 ,m2 )6=(r1 ,r2 ) in function of Er1 ,r2 |r1 ,r2 as follows:

E(l1 ,l2 )=(r1 ,r2 )|(m1 ,m2 )6=(r1 ,r2 ) Er1 ,r2 |r1 ,r2

= min Er1 |m1 (Er1 ,r2 |r1 ,r2 ), Er2 |m2 (Er1 ,r2 |r1 ,r2 ) , (7.21)
m1 6=r1 ,m2 6=r2
where Er1 |m1 (Er1 ,r2 |r1 ,r2 ) and Er2 |m2 (Er1 ,r2 |r1 ,r2 ) are determined by
(7.7).
Finally we obtained
Theorem 7.6. If the distributions Pl , l = 1, L, are different and the

given strictly positive number Er1 ,r2 |r1 ,r2 satisfy condition (7.20), then
the reliability E(l1 ,l2 )=(r1 ,r2 )|(m1 ,m2 )6=(r1 ,r2 ) is defined in (7.21).
The case with K = 2, L = 3 for the model of independent objects

is studied in [112] and the problem for a model consisting of three or
more independent objects is solved in paper [110].
The following example is treated in the papers [106] and [110] con-
cerning one object and two independent objects, respectively.
Let the set X = {0, 1} contains two elements and the follow-
ing PDs are given on X : P1 = {0, 10; 0, 90}, P2 = {0, 85; 0, 14}, P3 =
{0, 23; 0, 77}. In Figures 7.4 and 7.5 the results of calculations of
Fig. 7.4 E2,1|1,2 (E1,1|3,1 , E2,2|2,3 ) for two independent objects.
Fig. 7.5 E2,1 (E1,1 ) for one object.

functions E2,1|1,2 (E1,1|3,1 , E2,2|2,3 ) (for two independent objects) and

E2|1 (E1|1 ) (for one object) are presented, respectively.
The problem of hypothesis LAO testing is resolved by Haroutunian
and Yessayan for a model consisting of two different objects (distribu-
tions of which cannot be the same) in the paper [125] for the case of
three hypotheses (aforementioned Problem 1), and for two statistically
dependent objects in [126].
In the paper [109] Haroutunian and Grigoryan examined the prob-
lem of three or more hypotheses LAO testing for a pair of simple homo-
geneous stationary Markov chains.
It was noted in [18] and [19] that discussed problems and results
may be extended in several directions. It is interesting to examine
models in point of view of remote statistical inference formulated by
Berger [28]. It is necessary to study models which are described by
more general classes of RVs and processes [42, 87, 93, 94, 177]. One
of the directions is connected with the inference of compressed data
[12, 22, 95, 227]. One may see perspectives in application of identifi-
cation approach and methods to the authentication theory [173] and
steganography [38].
Basic Notations and Abbreviations
|a|+ max(a, 0)
bac integer part of the number a
AVC arbitrarily varying channel
AVS arbitrarily varying source
BC broadcast channel
BSC binary symmetric channel
C(W ) capacity of DMC W
C(W ) capacity of DMC W for average error
probability
C(E, W ) = R(E, W ) E-capacity or rate-reliability function
C0 (W ) zero error capacity of DMC W
CRP channel with random parameter
co(A) the convex hull of the set A
d distortion measure
d(x, x
b) average distortion measure of vectors x and x
b
DCC discrete compound channel
DMC discrete memoryless channel
DMS discrete memoryless source
D(P kQ) divergence of PD P from Q
153
154 Basic Notations and Abbreviations
EP X expectation of the RV X
e(f, g, N, W ) the maximal error probability of N -block code
(f, g) for channel W
e(f, g, P ∗ , ∆, N ) error probability of DMS P ∗ N -block code (f, g)
subject to distortion ∆
E error probability exponent (reliability)
E(R, W ) reliability function of DMC W
Esp (R, W ) sphere packing bound for the reliability function
of DMC W
Er (R, W ) random coding bound for the reliability function
of DMC W
Ex (R, W ) expurgated bound for the reliability function of
DMC W
exp, log are to the base two
GCRP generalized channel with random parameter
GIFC general interference channel
g −1 (m) {y : g(y) = m}
HP (X) entropy of RV X with PD P
HP,V (Y |X) conditional entropy of RV Y for given RV X
with PD P and conditional PD V of Y
given X
IP,V (Y ∧ X) conditional mutual information with PD P ◦ W
IQ,P,V (Y ∧ X | U ) mutual information of RV X and Y
IFC interference channel
L(N ) volume of source code
MAC multiple-access channel
MACRP multiple-access channel with random parameter
M message set
M number of messages of the set M
n = 1, N n = 1, 2, . . . , N
PD probability distribution
P(X ) set of all PD on X
PN (X ) subset of P(X ) consisting of the possible types
of sequences x ∈ X N
P, Q, V, W, . . . PD
Basic Notations and Abbreviations 155
P ◦W {P ◦ W (x, y) = P (x)W (y|x), x ∈ X ,

y ∈ Y}
P
PW {P W (y) = x P (x)W (y|x), y ∈ Y}
R(f, g, N ) transmission rate of a code (f, g) of
blocklength N
R(∆, P ∗ ) rate-distortion function
R(E, ∆, P ∗ ) rate-reliability-distortion function of the
source with generic PD P ∗
RBH (E, ∆, P ∗ ) binary Hamming rate-reliability-
distortion function
TWC two-way channel
RTWC restricted two-way channel
RV random variable
TPN (X) set of vectors x of type P ,
also called type
N (Y |x)
TP,V set of vectors y of conditional type V
for given x of type P , also called
V -shell of x
U, X, Y, S, . . . RV with values in U, X , Y, S, . . .
VN (Y, P ) set of all possible V -shells for x of type P
W :X →Y discrete channel with input alphabet X ,
output alphabet Y and matrix of
transition probabilities W
x vector (x1 , . . . , xN )
A, B, P, X , Y, U, R, S, Z, . . . finite sets
A⊂X A is a subset of X
|X | size of the set X
x∈X x is an element of the set X
X ×Y Cartesian product of the sets X and Y
XN the set of N -length sequences of elements
of X
X →Y →Z RV X, Y, Z form a Markov chain in this
order
M
= by definition is equal
References
[1] R. F. Ahlswede, “On two-way communication channels and a problem

by Zarankiewicz,” in Transactions on 6th Prague Conference on Informa-
tion Theory, Statistical Decision Function, Random Proceedings, pp. 23–37,
Prague, 1971.
[2] R. F. Ahlswede, “Multy-way communication channels,” in 2nd International
Symposium on Information Theory, Tsahkadzor, Armenia, 1971, pp. 23–52,
Budapest: Akad. Kiado, 1973.
[3] R. F. Ahlswede, “The capacity region of a channel with two senders and two
receivers,” Annals of Probability, vol. 2, no. 2, pp. 805–814, 1974.
[4] R. F. Ahlswede, “Elimination of correlation in random codes for arbitrar-
ily varying channels,” Z. Wahrscheinlichkeitstheorie Verw. Gebiete, vol. 44,
pp. 186–194, 1978.
[5] R. F. Ahlswede, “Coloring hypergraphs: A new approach to multi-user source
coding,” Part I, Journal of Combinatories, Information and System Sciences,
vol. 4, no. 1, pp. 75–115, 1979.
[6] R. F. Ahlswede, “Coloring hypergraphs: A new approach to multi-user source
coding,” Part II, Journal of Combinatories, Information and System Sciences,
vol. 5, no. 2, pp. 220–268, 1980.
[7] R. F. Ahlswede, “The rate-distortion region for multiple descriptions with-
out excess rate,” IEEE Transactions on Information Theory, vol. 31, no. 6,
pp. 721–726, 1985.
[8] R. F. Ahlswede, “Arbitrarily varying channels with states sequence known to
the sender,” IEEE Transations on Information Theory, vol. 32, no. 5, pp. 621–
629, 1986.
157
158 References
[9] R. F. Ahlswede, “Extremal properties of rate-distortion functions,” IEEE

Transactions on Information Theory, vol. 36, no. 1, pp. 166–171, 1990.
[10] R. F. Ahlswede, “General theory of information transfer,” Preprint 97-118,
Discrete Strukturen in der Mathematik, Universität Bielefeld, 1997.
[11] R. F. Ahlswede, E. Aloyan, and E. A. Haroutunian, “On logarithmically
asymptotically optimal hypothesis testing for arbitrarily varying source with
side information,” in Lecture Notes in Computer Science, Vol. 4123, General
Theory of Information Transfer and Combinatorics, pp. 457–461, Springer
Verlag, 2006.
[12] R. F. Ahlswede and M. Burnashev, “On minimax estimation in the presence
of side information about remote data,” Annals of Statistics, vol. 18, no. 1,
pp. 141–171, 1990.
[13] R. F. Ahlswede and N. Cai, “Arbitrarily varying multiple access channels,”
Part 1, Preprint 96-068, Discrete Strukturen in der Mathematik, Universität
Bielefeld, 1996.
[14] R. F. Ahlswede and N. Cai, “Arbitrarily varying multiple access channels,”
Part 2, Preprint 97-006, Discrete Strukturen in der Mathematik, Universität
Bielefeld, 1997.
[15] R. F. Ahlswede and N. Cai, “Correlated sources help transmission over an arbi-
trarily varying channel,” IEEE Transactions on Information Theory, vol. 43,
pp. 1254–1255, 1997.
[16] R. F. Ahlswede and I. Csiszár, “Hypothesis testing with communication con-
straints,” IEEE Transactions on Information Theory, vol. 32, pp. 533–542,
1986.
[17] R. F. Ahlswede and G. Dueck, “Identification via channels,” IEEE Transac-
tions on Information Theory, vol. 35, no. 1, pp. 15–29, 1989.
[18] R. F. Ahlswede and E. A. Haroutunian, “On statistical hypothesis optimal
testing and identification,” Transactions of the Institute for Informatics and
Automation Problems NAS of RA, Mathematical Problems of Computer Sci-
ence, vol. 24, pp. 16–33, 2005.
[19] R. F. Ahlswede and E. A. Haroutunian, “On logarithmically asymptotically
optimal testing of hypotheses and identification,” in Lecture Notes in Com-
puter Science, Vol. 4123, General Theory of Information Transfer and Com-
binatorics, pp. 462–478, Springer Verlag, 2006.
[20] R. F. Ahlswede and J. Körner, “Source coding with side information and a
converse for degraded broadcast channels,” IEEE Transactions on Information
Theory, vol. 21, pp. 629–637, November 1975.
[21] R. F. Ahlswede and I. Wegener, Search Problems. New York: J. Wiley-
Interscience, 1987. (German original, Teubner, Sfuttgart, 1979, Russian trans-
lation, Mir, Moscow 1982).
[22] R. F. Ahlswede, E. Yang, and Z. Zhang, “Identification via compressed data,”
IEEE Transactions on Information Theory, vol. 43, no. 1, pp. 48–70, 1997.
[23] V. Anantharam, “A large deviations approach to error exponent in source
coding and hypothesis testing,” IEEE Transactions on Information Theory,
vol. 36, no. 4, pp. 938–943, 1990.
References 159
[24] R. E. Bechhofer, J. Kiefer, and M. Sobel, Sequential Identification and Ranking

Procedures. Chicago: The University of Chicago Press, 1968.
[25] R. Benzel, “The capacity region of a class of discrete additive degraded inter-
ference channels,” IEEE Transactions on Information Theory, vol. 25, no. 2,
pp. 228–231, 1979.
[26] T. Berger, Rate Distortion Theory: A Mathematical Basis for Data Compres-
sion. Englewood Cliffs, NJ: Prentice-Hall, 1971.
[27] T. Berger, “The source coding game,” IEEE Transaactions on Information
Theory, vol. 17, no. 1, pp. 71–76, 1971.
[28] T. Berger, “Decentralized estimation and decision theory,” in Presented at
IEEE Seven Springs Workshop on Information Theory, Mt. Kisco, NY,
September 1979.
[29] T. Berger and Z. Zhang, “New results in binary multiple descriptions,” IEEE
[30] T. Berger and Z. Zhang, “Multiple description source coding with no excess
marginal rate,” IEEE Transactions on Information Theory, vol. 41, no. 2,
pp. 349–357, 1995.
[31] P. P. Bergmans, “Random coding theorem for broadcast channels with
degraded components,” IEEE Transactions on Information Theory, vol. 9,
pp. 197–207, 1973.
[32] L. Birgé, “Vitesse maximales de décroissance des erreurs et tests optimaux
associés,” Z. Wahrscheinlichkeitstheorie Verw. Gebiete, vol. 55, pp. 261–273,
1981.
[33] D. Blackwell, L. Breiman, and A. J. Thomasian, “The capacity of a class of
channels,” Annals of Mathematical Statistics, vol. 30, no. 4, pp. 1229–1241,
1959.
[34] R. E. Blahut, “Hypothesis testing and information theory,” IEEE Transac-
tions on Information Theory, vol. 20, pp. 405–417, 1974.
[35] R. E. Blahut, Principles and Practice of Information Theory. Reading, MA:
Addison-Wesley, 1987.
[36] A. A. Borovkov, Mathematical Statistics. (in Russian), Nauka, Novosibirsk,
1997.
[37] M. V. Burnashev, S. Amari, and T. S. Han, “BSC: Testing of hypothesis with
information constraints,” in Numbers, Information and Complexity, (Althófer,
ed.), Boston: Kluwer Academic Publishers, 2000.
[38] C. Cachin, “An information-theoretic model for steganography,” in Proceed-
ings of 2nd Workshop on Information Hiding, (D. Ausmith, ed.), in Lecture
Notes in Computer Science, Springer Verlag, 1998.
[39] A. B. Carleial, “A case where interference does not reduce capacity,” IEEE
Transactions on Information Theory, vol. 21, pp. 569–570, 1975.
[40] A. B. Carleial, “Interference channels,” IEEE Transactions on Information
Theory, vol. 24, no. 1, pp. 60–70, 1978.
[41] A. B. Carleial, “Outer bounds on the capacity of interference channels,” IEEE
160 References
[42] P.-N. Chen, “General formula for the Neyman-Pearson type-II error exponent
subject to fixed and exponential type-I error bounds,” IEEE Transactions on
Information Theory, vol. 42, no. 1, pp. 316–323, 1996.
[43] P.-N. Chen and F. Alajaji, Lecture Notes in Information Theory. vol. I and
II, http://shannon.cm.nctu.edu.tw.
[44] M. H. M. Costa and A. El Gamal, “The capacity region of the discrete mem-
oryless interference channel with strong interference,” IEEE Transactions on
Information Theory, vol. 33, pp. 710–711, 1987.
[45] T. M. Cover, “Broadcast channels,” IEEE Transactions on Information The-
ory, vol. 18, no. 1, pp. 2–14, 1972.
[46] T. M. Cover, “An achievable rate region for the broadcast channel,” IEEE
[47] T. M. Cover, “Comments on broadcast channels,” IEEE Transactions on
Information Theory, vol. 44, pp. 2524–2530, 1998.
[48] T. M. Cover and J. A. Thomas, Elements of Information Theory. New York:
Wiley, 1991.
[49] I. Csiszár, “The method of types,” IEEE Transactions on Information Theory,
vol. 44, no. 6, pp. 2505–2523, 1998.
[50] I. Csiszár and J. Körner, “Graph decomposition: A new key to coding theo-
rems,” IEEE Transactions on Information Theory, vol. 27, no. 1, pp. 5–12,
1981.
[51] I. Csiszár and J. Körner, Information Theory: Coding Theorems for Discrete
Memoryless Systems. New York: Academic Press, 1981. (Russian translation,
Mir, Moscow, 1985).
[52] I. Csiszár, J. Körner, and K. Marton, “A new look at the error exponent of
a discrete memoryless channel,” in Presented at the IEEE International Sym-
posium on Information Theory, Ithaca, NY: Cornell Univ., 1977. (Preprint).
[53] I. Csiszár and G. Longo, “On the error exponent for source coding and for
testing simple statistical hypotheses,” Studia Scientiarum Mathematicarum
Hungarica, vol. 6, pp. 181–191, 1971.
[54] I. Csiszár and P. Shields, “Information theory and statistics: A tutorial,” in
Foundation and Trends in Communications and Information theory, note-
Hanover, MA, USA: now Publishers, 2004.
[55] A. Das and P. Narayan, “Capacities of time-varying multiple-access channels
with side information,” IEEE Transactions on Information Theory, vol. 48,
no. 1, pp. 4–25, 2002.
[56] R. L. Dobrushin, “Optimal information transmission in a channel with
unknown parameters,” (in Russian), Radiotekhnika Electronika, vol. 4,
pp. 1951–1956, 1959.
[57] R. L. Dobrushin, “Asymptotic bounds of the probability of error for the trans-
mission of messages over a memoryless channel with a symmetric transition
probability matrix,” (in Russian), Teorija Veroyatnost. i Primenen, vol. 7,
no. 3, pp. 283–311, 1962.
[58] R. L. Dobrushin, “Survey of Soviet research in information theory,” IEEE
References 161
[59] R. L. Dobrushin, M. S. Pinsker, and A. N. Shiryaev, “Application of the

notion of entropy in the problems of detecting a signal in noise,” (in Russian),
Lithuanian Mathematical Transactions, vol. 3, no. 1, pp. 107–122, 1963.
[60] J. L. Doob, Stochastic Processes. New York: Wiley, London: Chapman and
Hall, 1953.
[61] G. Dueck, “Maximal error capacity regions are smaller than average error
capacity regions for multi-user channels,” Problems of Control and Informa-
tion Theory, vol. 7, no. 1, pp. 11–19, 1978.
[62] G. Dueck, “The capacity region of two-way channel can exceed the inner
bound,” Information and Control, vol. 40, no. 3, pp. 258–266, 1979.
[63] G. Dueck and J. Körner, “Reliability function of a discrete memoryless channel
at rates above capacity,” IEEE Transactions on Information Theory, vol. 25,
no. 1, pp. 82–85, 1979.
[64] A. G. Dyachkov, “Random constant composition codes for multiple access
channels,” Problems of Control and Information on Theory, vol. 13, no. 6,
pp. 357–369, 1984.
[65] A. G. Dyachkov, “Lower bound to average by ensemble error probability for
multiple access channel,” (in Russian), Problems of Information on Transmis-
sion, vol. 22, no. 1, pp. 98–103, 1986.
[66] A. El Gamal and M. N. Costa, “The capacity region of a class of determinis-
tic interference channel,” IEEE Transactions on Information Theory, vol. 28,
no. 2, pp. 343–346, 1982.
[67] A. El Gamal and T. M. Cover, “Achievable rates for multiple descriptions,”
[68] A. El Gamal and E. Van der Meulen, “A proof of Marton’s coding theorem
for the discrete memoryless broadcast channel,” IEEE Transactions on Infor-
mation Theory, vol. 27, pp. 120–122, 1981.
[69] P. Elias, “Coding for noisy channels,” IRE Convention Record, Part 4, pp. 37–
46, 1955.
[70] W. H. R. Equitz and T. M. Cover, “Successive refinement of information,”
[71] T. Ericson, “Exponential error bounds for random codes in the arbitrarily
varying channels,” IEEE Transactions on Information Theory, vol. 31, no. 1,
pp. 42–48, 1985.
[72] R. M. Fano, Transmission of Information, A Statistical Theory of Communi-
cation. New York, London: Wiley, 1961.
[73] M. Feder and N. Merhav, “Universal composite hypothesis testing: A compet-
itive minimax approach,” IEEE Transactions on Information Theory, vol. 48,
no. 6, pp. 1504–1517, 2002.
[74] A. Feinstein, “A new basic theorem of information theory,” IRE Transactions
on Information Theory, vol. 4, pp. 2–22, 1954.
[75] A. Feinstein, Foundations of Information Theory. New York: McGraw-Hill,
1958.
[76] G. D. Forney, “Exponential error bounds for erasure, list and decision feedback
schemes,” IEEE Transactions on Information Theory, vol. 14, no. 2, pp. 206–
220, 1968.
162 References
[77] F. W. Fu and S. Y. Shen, “Hypothesis testing for arbitrarily varying source

with exponential-type constraint,” IEEE Transactions on Information Theory,
vol. 44, no. 2, pp. 892–895, 1998.
[78] R. G. Gallager, “A simple derivation of the coding theorems and some appli-
cations,” EEE Transactions on Information Theory, vol. 11, no. 1, pp. 3–18,
1965.
[79] R. G. Gallager, Information Theory and Reliable Communication. New York:
Wiley, 1968.
[80] R. G. Gallager, “Capacity and coding for degraded broadcast channels,” (in
Russian), Problems of Informations on Transmission, vol. 10, no. 3, pp. 3–14,
1974.
[81] R. G. Gallager, “A perspective on multiaccess channels,” IEEE Transactions
on Information Theory, vol. 31, no. 1, pp. 124–142, 1985.
[82] R. G. Gallager, “Claude E. Shannon: A retrospective on his life, work, and
impact,” IEEE Transactions on Information Theory, vol. 47, no. 7, pp. 2681–
2695, 2001.
[83] A. E. Gamal, “The capacity of a class of broadcast channels,” IEEE Trans-
actions on Information Theory, vol. 25, no. 2, pp. 166–169, 1979.
[84] S. I. Gelfand, “Capacity of one broadcast channel,” (in Russian), Problems on
Information Transmission, vol. 13, no. 3, pp. 106–108, 1977.
[85] S. I. Gelfand and M. S. Pinsker, “Capacity of broadcast channel with one
deterministic component,” (in Russian), Problems on Information Transmis-
sion, vol. 16, no. 1, pp. 24–34, 1980.
[86] S. I. Gelfand and M. S. Pinsker, “Coding for channel with random parame-
ters,” Problems of Control and Information Theory, vol. 8, no. 1, pp. 19–31,
1980.
[87] M. Gutman, “Asymptotically optimal classification for multiple test with
empirically observed statistics,” IEEE Transactions on Information Theory,
vol. 35, no. 2, pp. 401–408, 1989.
[88] B. E. Hajek and M. B. Pursley, “Evaluation of an achievable rate region for
the broadcast channel,” IEEE Transactions on Information Theory, vol. 25,
pp. 36–46, 1979.
[89] T. S. Han, “The capacity region of general multiple-access channel with certain
correlated sources,” Information and Control, vol. 40, no. 1, pp. 37–60, 1979.
[90] T. S. Han, “Slepian-Wolf-Cover theorem for networks of channels,” Informa-
tion and Control, vol. 47, no. 1, pp. 67–83, 1980.
[91] T. S. Han, “The capacity region for the deterministic broadcast channel with
a common message,” IEEE Transactions on Information Theory, vol. 27,
pp. 122–125, 1981.
[92] T. S. Han, “Hypothesis testing with multiterminal data compression,” IEEE
[93] T. S. Han, “Hypothesis testing with the general source,” IEEE Transactions
[94] T. S. Han, Information-Spectrum Methods in Information Theory. Berlin:
Springer Verlag, 2003.
References 163
[95] T. S. Han and S. Amari, “Statistical inference under multiterminal data com-
pression,” IEEE Transactions on Information Theory, vol. 44, no. 6, pp. 2300–
2324, 1998.
[96] T. S. Han and K. Kobayashi, “A new achievable region for the interference
channel,” IEEE Transactions on Information Theory, vol. 27, no. 1, pp. 49–60,
1981.
[97] T. S. Han and K. Kobayashi, “Exponential-type error probabilities for multi-
terminal hypothesis testing,” IEEE Transactions on Information Theory,
vol. 35, no. 1, pp. 2–13, 1989.
[98] E. A. Haroutunian, “Upper estimate of transmission rate for memoryless chan-
nel with countable number of output signals under given error probability
exponent,” in 3rd All Union Conference on Theory of Information Trans-
mission and Coding, Uzhgorod, Publishing House of the Uzbek Academy of
Science, pp. 83–86, Tashkent, 1967. (in Russian).
[99] E. A. Haroutunian, “Estimates of the error probability exponent for a semicon-
tinuous memoryless channel,” (in Russian), Problems on Information Trans-
mission, vol. 4, no. 4, pp. 37–48, 1968.
[100] E. A. Haroutunian, “On the optimality of information transmission by a chan-
nel with finite number of states known at the input,” (in Russian), Izvestiya
Akademii Nauk Armenii, Matematika, vol. 4, no. 2, pp. 81–90, 1969.
[101] E. A. Haroutunian, “Error probability lower bound for the multiple-access
communication channels,” (in Russian), Problems of Information Transmis-
sion, vol. 11, no. 2, pp. 22–36, 1975.
[102] E. A. Haroutunian, “Combinatorial method of construction of the upper
bound for E-capacity,” (in Russian), Mezhvuz. Sbornic Nouchnikh Trudov,
Matematika, Yerevan, vol. 1, pp. 213–220, 1982.
[103] E. A. Haroutunian, “On asymptotically optimal testing of hypotheses concern-
ing Markov chain,” (in Russian), Izvestiya Akademii Nauk Armenii, Matem-
atika, vol. 23, no. 1, pp. 76–80, 1988.
[104] E. A. Haroutunian, “Asymptotically optimal testing of many statistical
hypotheses concerning Markov chain,” in 5th International Vilnius Conference
on Probability Theory and Mathematical Statistics, vol. 1(A–L), pp. 202–203,
1989.
[105] E. A. Haroutunian, “On asymptotically optimal criteria for Markov chains,”
(in Russian), First World Congress of Bernoulli Society, section 2, vol. 2,
no. 3, pp. 153–156, 1989.
[106] E. A. Haroutunian, “Logarithmically asymptotically optimal testing of mul-
tiple statistical hypotheses,” Problems of Control and Information Theory,
vol. 19, no. 5–6, pp. 413–421, 1990.
[107] E. A. Haroutunian, “On Bounds for E-capacity of DMC,” IEEE Transactions
[108] E. A. Haroutunian and B. Belbashir, “Lower estimate of optimal transmission
rates with given error probability for discrete memoryless channel and for
asymmetric broadcast channel,” in 6-th International Symposium on Infor-
mation Theory, pp. 19–21, Tashkent, 1984. (in Russian).
164 References
[109] E. A. Haroutunian and N. M. Grigoryan, “Reliability approach for testing of

many distributions for pair of Markov chains,” Transactions of the Institute for
Informatics and Automation Problems NAS of RA, Mathematical Problems of
Computer Science, vol. 29, pp. 89–96, 2007.
[110] E. A. Haroutunian and P. M. Hakobyan, “On multiple hypotheses LAO testing
for many independent objects,” In preparation.
[111] E. A. Haroutunian and P. M. Hakobyan, “On multiple hypothesis testing by
informed statistician for arbitrarily varying object and application to source
coding,” Transactions of the Institute for Informatics and Automation Prob-
lems NAS of RA, Mathematical Problems of Computer Science, vol. 23, pp. 36–
46, 2004.
[112] E. A. Haroutunian and P. M. Hakobyan, “On logarithmically asymptotically
optimal testing of three distributions for pair of independent objects,” Trans-
actions of the Institute for Informatics and Automation Problems NAS of RA,
Mathematical Problems of Computer Science, vol. 24, pp. 76–81, 2005.
[113] E. A. Haroutunian and P. M. Hakobyan, “On identification of distributions of
two independent objects,” Transactions of the Institute for Informatics and
Automation Problems of the NAS of RA, Mathematical Problems of Computer
Science, vol. 28, pp. 114–119, 2007.
[114] E. A. Haroutunian and A. N. Haroutunian, “The binary Hamming rate-
reliability-distortion function,” Transactions on Institute for Informatics and
Automation Problems NAS of RA and YSU, Mathematical Problems of Com-
puter Science, vol. 18, pp. 40–45, 1997.
[115] E. A. Haroutunian, A. N. Haroutunian, and A. R. Kazarian (Ghazaryan),
“On rate-reliabilities-distortions function of source with many receivers,” in
Proceedings of Joint Session 6th Prague Symposium Asymptotic Statistics and
13-th Prague Conforence Information Theory, Statistical Decision Function
Random Proceed, Vol. 1, pp. 217–220, Prague, 1998.
[116] E. A. Haroutunian and M. E. Haroutunian, Information Theory. (in Arme-
nian), Yerevan State University, p. 104, 1987.
[117] E. A. Haroutunian and M. E. Haroutunian, “Channel with random parame-
ter,” in Proceedings of 12-th Prague Conference on Information Theory, Sta-
tistical Decision Function Random Proceedings, p. 20, 1994.
[118] E. A. Haroutunian and M. E. Haroutunian, “Bounds of E-capacity region for
restricted two-way channel,” (in Russian), Problems of Information Transmis-
sion, vol. 34, no. 3, pp. 7–16, 1998.
[119] E. A. Haroutunian, M. E. Haroutunian, and A. E. Avetissian, “Multiple-
access channel achievable rates region and reliability,” Izvestiya Akademii
Nauk Armenii, Matematika, vol. 27, no. 5, pp. 51–67, 1992.
[120] E. A. Haroutunian and A. N. Harutyunyan, “Successive refinement of infor-
mation with reliability criterion,” in Proceedings of IEEE International Sym-
posium on Information Theory, p. 205, Sorrento, Italy, 2000.
[121] E. A. Haroutunian, A. N. Harutyunyan, and A. R. Ghazaryan, “On rate-
reliability-distortion function for robust descriptions system,” IEEE Transac-
References 165
[122] E. A. Haroutunian and A. R. Kazarian, “On cascade system coding rates

with respect to distortion criteria and reliability,” Transactions of the Institute
Informatics and Automation Problems NAS of RA and YSU, Mathematical
Problems of Computer Science, vol. 18, pp. 19–32, 1997.
[123] E. A. Haroutunian and R. S. Maroutian, “(E, ∆)-achievable rates for multiple
descriptions of random varying source,” Problems of Control and Information
Theory, vol. 20, no. 2, pp. 165–178, 1991.
[124] E. A. Haroutunian and B. Mekoush, “Estimates of optimal rates of codes
with given error probability exponent for certain sources,” in 6th International
Symposium on Information Theory, vol. 1, pp. 22–23, Tashkent, 1984. (in
Russian).
[125] E. A. Haroutunian and A. O. Yessayan, “On hypothesis optimal testing for two
differently distributed objects,” Transactions of the Institute for Informatics
and Automation Problems NAS of RA, Mathematical Problems of Computer
Science, vol. 25, pp. 89–94, 2006.
[126] E. A. Haroutunian and A. O. Yessayan, “On logarithmically asymptotically
optimal hypothesis testing for Pair of statistically dependent objects,” Trans-
actions of the Institute for Informatics and Automation Problems NAS of RA,
[127] M. E. Haroutunian, “E-capacity of arbitrarily varying channel with informed
encoder,” Problems of Information Transmission, (in Russian), vol. 26, no. 4,
pp. 16–23, 1990.
[128] M. E. Haroutunian, “Bounds of E-capacity for the channel with random
parameter,” Problems of Information Transmission, (in Russian), vol. 27,
no. 1, pp. 14–23, 1991.
[129] M. E. Haroutunian, “About achievable rates of transmission for interference
channel,” Transactions of the Institute for Informatics and Automation Prob-
lems NAS of RA and YSU, Mathematical Problems of Computer Science,
vol. 20, pp. 79–89, 1998.
[130] M. E. Haroutunian, “Random coding bound for E-capacity region of the
broadcast channel,” Transactions of Institute Informatics and Automation
Problems NAS of RA and YSU, Mathematical Problems of Computer Science,
vol. 21, pp. 50–60, 2000.
[131] M. E. Haroutunian, “New bounds for E-capacities of arbitrarily varying chan-
nel and channel with random parameter,” Transactions of the Institute for
Informatics and Automation Problems NAS of RA and YSU, Mathematical
Problems of Computer Science, vol. 22, pp. 44–59, 2001.
[132] M. E. Haroutunian, “On E-capacity region of multiple-access channel,” (in
Russian) Izvestiya Akademii Nauk Armenii, Matematika, vol. 38, no. 1, pp. 3–
22, 2003.
[133] M. E. Haroutunian, “On multiple-access channel with random parameter,” in
Proceedings of International Conference on Computer Science and Informa-
tion Technology, pp. 174–178, Yerevan, Armenia, 2003.
[134] M. E. Haroutunian, “Bounds of E-capacity for multiple-access chanel with
random parameter,” in Lecture Notes in Computer Science, General Theory
166 References
of Information Transfer and Combinatorics, pp. 166–183, Springer Verlag,

2005.
[135] M. E. Haroutunian and A. H. Amirbekyan, “Random coding bound for
E-capacity region of the channel with two inputs and two outputs,” Transac-
tions of the Institute for Informatics and Automation Problems NAS of RA
and YSU, Mathematical Problems of Computer Science, vol. 20, pp. 90–97,
1998.
[136] M. E. Haroutunian and S. A. Tonoyan, “On estimates of rate-reliability-
distortion function for information hiding system,” Transactions of the
Institute for Informatics and Automation Problems NAS of RA and YSU,
[137] M. E. Haroutunian and S. A. Tonoyan, “Random coding bound of information
hiding E-capacity,” Transactions of IEEE International Symposium Informa-
tion Theory, Chicago, USA, p. 536, 2004.
[138] M. E. Haroutunian and S. A. Tonoyan, “On information hiding system with
multiple messages,” Transactions of the Institute for Informatics and Automa-
tion Problems NAS of RA and YSU, Mathematical Problems of Computer
Science, vol. 24, pp. 89–103, 2005.
[139] M. E. Haroutunian, S. A. Tonoyan, O. Koval, and S. Voloshynovskiy, “Random
coding bound of reversible information hiding E-capacity,” Transactions of
the Institute for Informatics and Automation Problems NAS of RA and YSU,
[140] A. N. Haroutunian (Harutyunyan) and E. A. Haroutunian, “An achievable
rates-reliabilities-distortions dependence for source coding with three descrip-
tions,” Transactions of the Institute for Informatics and Automation Problems
NAS of RA and YSU, Mathematical Problems of Computer Science, vol. 17,
pp. 70–75, 1997.
[141] A. N. Harutyunyan, “Notes on conditions for successive refinement of informa-
tion,” in Lecture Notes in Computer Science, General Theory of Information
Transfer and Combinatorics, pp. 130–138, Springer Verlag, 2006.
[142] A. N. Harutyunyan and A. J. Han Vinck, “Error exponent in AVS cod-
ing,” in Proceedings of IEEE International Symposium on Information Theory,
pp. 2166–2170, Seattle, WA, July 9–14 2006.
[143] A. N. Harutyunyan and E. A. Haroutunian, “On properties of rate-reliability-
distortion function,” IEEE Transactions on Information Theory, vol. 50,
no. 11, pp. 2768–2769, 1996.
[144] A. S. Hekstra and F. M. J. Willems, “Dependence balance bounds for single-
output two-way channel,” IEEE Transactions on Information Theory, vol. 35,
no. 1, pp. 44–53, 1989.
[145] W. Hoeffding, “Asymptotically optimal tests for multinomial distributions,”
Annals of Mathematical Statistics, vol. 36, pp. 369–401, 1965.
[146] B. L. Hughes and T. G. Thomas, “On error exponents for arbitrarily varying
channels,” IEEE Transactions on Information Theory, vol. 42, no. 1, pp. 87–
98, 1996.
[147] S. Ihara, Information Theory for Continuous Systems. Singapore: World Sci-
entific, 1993.
References 167
[148] P. Jacket and W. Szpankovksi, “Markov types and minimax redundancy for
Markov sources,” IEEE Transactions on Information Theory, vol. 50, no. 7,
pp. 1393–1402, 2004.
[149] J. Jahn, “Coding of arbitrarily varying multiuser channels,” IEEE Transac-
[150] F. Jelinek, “Evaluation of expurgated bound exponents,” IEEE Transactions
[151] A. Kanlis and P. Narayan, “Error exponents for successive refinement by par-
titioning,” IEEE Transactions on Information Theory, vol. 42, no. 1, pp. 275–
282, 1996.
[152] V. D. Kolesnik and G. S. Poltirev, Textbook of Information Theory. (in Rus-
sian), Nauka, Moscow, 1982.
[153] J. Körner and K. Marton, “General broadcast channels with degraded message
sets,” IEEE Transactions on Information Theory, vol. 23, pp. 60–64, 1977.
[154] J. Körner and A. Sgarro, “Universally attainable error exponents for broadcast
channels with degraded message sets,” IEEE Transactions on Information
Theory, vol. 26, pp. 670–679, 1980.
[155] V. N. Koshelev, “Multilevel source coding and data-transmission theorem,”
in Proceedings of VII All-Union Conference on Theory of Coding and Data
Transmission, pp. 85–92, Vilnius, U.S.S.R., pt. 1, 1978.
[156] V. N. Koshelev, “Hierarchical coding of discrete sources,” (in Russian), Prob-
lems of Information Transmission, vol. 16, no. 3, pp. 31–49, 1980.
[157] V. N. Koshelev, “An evaluation of the average distortion for discrete scheme
of sequential approximation,” (in Russian), Problems on Information Trans-
mission, vol. 17, no. 3, pp. 20–30, 1981.
[158] V. N. Koshelev, “On divisibility of discrete sources with the single-letter-
additive measure of distortion,” Problems on Information Transmission,
vol. 30, no. 1, pp. 31–50, (in Russian), 1994.
[159] B. D. Kudryashov and G. S. Poltyrev, “Upper bounds for decoding error prob-
ability in some broadcast channels,” (in Russian), Problems on Information
Transmission, vol. 15, no. 3, pp. 3–17, 1979.
[160] S. Kullback, Information Theory and Statistics. New York: Wiley, 1959.
[161] E. Levitan and N. Merhav, “A competitive Neyman-Pearson approach to uni-
versal hypothesis testing with applications,” IEEE Transactions on Informa-
tion Theory, vol. 48, no. 8, pp. 2215–2229, 2002.
[162] Y. Liang and G. Kramer, “Rate regions for relay broadcast channels,” IEEE
[163] Y. N. Lin’kov, “On asymptotical discrimination of two simple statistical
hypotheses,” (in Russian), Preprint 86.45, Kiev, 1986.
[164] Y. N. Lin’kov, “Methods of solving asymptotical problems of two simple sta-
tistical hypotheses testing,” (in Russian), Preprint 89.05, Doneck, 1989.
[165] Y. N. Lin’kov, Asymptotical Methods of Random Processes Statistics, (in Rus-
sian). Naukova Dumka, Kiev, 1993.
[166] Y. S. Liu and B. L. Hughes, “A new universal coding bound for the multiple-
access channel,” IEEE Transactions on Information Theory, vol. 42, pp. 376–
386, 1996.
168 References
[167] G. Longo and A. Sgarro, “The error exponent for the testing of simple sta-
tistical hypotheses: A combinatorial approach,” Journal of Combinatories,
Informational System Sciences, vol. 5, no. 1, pp. 58–67, 1980.
[168] A. A. Lyapunov, “On selection between finite number of distributions,” (in
Russian), Uspekhi Matematicheskikh Nauk, vol. 6, no. 1, pp. 178–186, 1951.
[169] I. Marić, R. D. Yates, and G. Kramer, “Capacity of interference channels with
partial transmitter cooperation,” IEEE Transactions on Information Theory,
vol. 53, no. 10, pp. 3536–3548, 2007.
[170] R. S. Maroutian, “Achievable rates for multiple descriptions with given expo-
nent and distortion levels,” (in Russian), Problems on Information Transmis-
sion, vol. 26, no. 1, pp. 83–89, 1990.
[171] K. Marton, “Error exponent for source coding with a fidelity criterion,” IEEE
[172] K. Marton, “A coding theorem for the discrete memoryless broadcast chan-
nel,” IEEE Transactions on Information Theory, vol. 25, pp. 306–311, 1979.
[173] U. M. Maurer, “Authentication theorey and hypothesis testing,” IEEE Trans-
actions on Information Theory, vol. 46, no. 4, pp. 1350–1356, 2000.
[174] N. Merhav, “On random coding error exponents of watermarking systems,”
[175] P. Moulin and J. A. O’Sullivan, “Information theoretic analysis of information
hiding,” IEEE Transactions on Information Theory, vol. 49, no. 3, pp. 563–
593, 2003.
[176] P. Moulin and Y. Wang, “Capacity and random-coding exponents for channel
coding with side information,” IEEE Transactions on Information Theory,
vol. 53, no. 4, pp. 1326–1347, 2007.
[177] S. Natarajan, “Large deviations, hypotheses testing, and source coding for
finite Markov chains,” IEEE Transactions on Information Theory, vol. 31,
no. 3, pp. 360–365, 1985.
[178] J. K. Omura, “A lower bounding method for channel and source coding prob-
abilities,” Information and Control, vol. 27, pp. 148–177, 1975.
[179] A. Peres, “Second-type-error exponent given the first-type-error exponent in
the testing statistical hypotheses by unfitted procedures,” in 6th International
Symposium on Information Theory, pp. 277–279, Tashkent, Part 1, 1984.
[180] M. S. Pinsker, “Capacity of noiseless broadcast channels,” Problems on Infor-
mation Transmission, (in Russian), vol. 14, no. 2, pp. 28–34, 1978.
[181] M. S. Pinsker, “Multi-user channels,” in II Joint Swedish-Soviet International
workshop on Information Theory, pp. 160–165, Gränna, Sweden, 1985.
[182] J. Pokorny and H. M. Wallmeier, “Random coding bound and codes pro-
duced by permutations for the multiple-access channel,” IEEE Transactions
[183] G. S. Poltyrev, “Random coding bounds for some broadcast channels,” (in
Russian), Problems on Information Transmission, vol. 19, no. 1, pp. 9–20,
1983.
[184] H. V. Poor and S. Verdú, “A lower bound on the probability of error in
multihypothesis testing,” IEEE Transactions on Information Theory, vol. 41,
no. 6, pp. 1992–1995, 1995.
References 169
[185] V. V. Prelov, “Information transmission by the multiple access channel with

certain hierarchy of sources,” (in Russian), Problems on Information Trans-
mission, vol. 20, no. 4, pp. 3–10, 1984.
[186] Puhalskii and Spokoiny, “On large deviation efficiency in statistical inference,”
Bernoulli, vol. 4, no. 2, pp. 203–272, 1998.
[187] B. Rimoldi, “Successive refinement of information: Characterization of the
achievable rates,” IEEE Transactions on Information Theory, vol. 40, no. 1,
pp. 253–259, 1994.
[188] H. Sato, “Two-user communication channels,” IEEE Transactions on Infor-
mation Theory, vol. 23, no. 3, pp. 295–304, 1977.
[189] H. Sato, “On the capacity region of a discrete two-user channel for strong
interference,” IEEE Transactions on Information Theory, vol. 3, pp. 377–379,
1978.
[190] H. Sato, “An outer bound to the capacity region of broadcast channel,” IEEE
[191] C. E. Shannon, “A mathematical theory of communication,” Bell System
Technical Journal, vol. 27, no. 3, pp. 379–423, 1948.
[192] C. E. Shannon, “The zero-error capacity of a noisy channel,” IRE Transactions
[193] C. E. Shannon, “Channel with side information at the transmitter,” IBM
Journal on Research and Development, vol. 2, no. 4, pp. 289–293, 1958.
[194] C. E. Shannon, “Coding theorems for a discrete source with a fidelity crite-
rion,” IRE National Convention Record, vol. 7, pp. 142–163, 1959.
[195] C. E. Shannon, “Probability of error for optimal codes in a Gaussian channel,”
Bell System Technical Journal, vol. 38, no. 5, pp. 611–656, 1959.
[196] C. E. Shannon, “Two-way communication channels,” in Proceedings of 4th
Berkeley Symposium on Mathematical Statistics and Probability, pp. 611–644,
Berkeley: University of California Press, 1961.
[197] C. E. Shannon, “Works on information theory and cybernetics,” in Collection
of Papers, (R. L. Dobrushin and O. B. Lupanov, eds.), Moscow: Publishing
House of Foreign Literature, 1963. (in Russian).
[198] C. E. Shannon, R. G. Gallager, and E. R. Berlekamp, “Lower bounds to
error probability for coding in discrete memoryless channel,” Information and
Control, vol. 10, no. 1, pp. 65–103, no. 2, pp. 523–552, 1967.
[199] D. Slepian and J. K. Wolf, “A coding theorem for multiple access channels
with correlated sources,” Bell System Technical Journal, vol. 52, pp. 1037–
1076, 1973.
[200] A. Somekh-Baruch and N. Merhav, “On error exponent and capacity games of
private watermarking systems,” IEEE Transactions on Information Theory,
vol. 49, no. 3, pp. 537–562, 2003.
[201] A. Somekh-Baruch and N. Merhav, “On the capacity game of public water-
marking systems,” IEEE Transactions on Information Theory, vol. 50, no. 3,
pp. 511–524, 2004.
[202] A. Somekh-Baruch and N. Merhav, “On the random coding error exponents
of the single-user and the multiple-access Gelfand-Pinsker channels,” in Pro-
170 References
ceedings of IEEE International Symposium on Information Theory, p. 448,

Chicago, USA, 2004.
[203] H. H. Tan, “Two-user interference channels with correlated information
sources,” Information and Control, vol. 44, no. 1, pp. 77–104, 1980.
[204] S. A. Tonoyan, “Computation of information hiding capacity and E-capacity
lower bounds,” Transactions of the Institute for Informatics and Automation
Problems NAS of RA and YSU, Mathematical Problems of Computer Science,
vol. 26, pp. 33–37, 2006.
[205] E. Tuncel, “On error exponent in hypothesis testing,” IEEE Transactions on
Information Theory, vol. 51, no. 8, pp. 2945–2950, 2005.
[206] E. Tuncel and K. Rose, “Error exponents in scalable source coding,” IEEE
Transactions on Information Theory, vol. 49, pp. 289–296, January 2003.
[207] G. Tusnády, “On asymptotically optimal tests,” Annals of Statistics, vol. 5,
no. 2, pp. 385–393, 1977.
[208] G. Tusnády, “Testing statistical hypoteses (an information theoretic
aproach),” Preprint, Mathematical Institute Hungarian Academy Sciences,
Budapest, 1979, 1982.
[209] E. C. van der Meulen, “The discrete memoryless channel with two senders and
one receiver,” in Proceedings of 2nd International Symposium on Information
Theory, Tsahkadzor, Armenia, 1971, pp. 103–135, Budapest: Akad. Kiado,
1973.
[210] E. C. van der Meulen, “Random coding theorems for the general discrete
memoryless broadcast channel,” IEEE Transactions on Information Theory,
vol. 21, pp. 180–190, 1975.
[211] E. C. van der Meulen, “A Survey of multi-way channels in information the-
ory: 1961–1976,” IEEE Transactions on Information Theory, vol. 23, no. 1,
pp. 1–37, 1977.
[212] E. C. van der Meulen, “Some recent results on the asymmetric multiple-access
channel,” in Proceedings of 2nd Joint Swedish-Soviet International Workshop
on Information Theory, pp. 172–176, Granna, Sweden, 1985.
[213] E. C. van der Meulen, E. A. Haroutunian, A. N. Harutyunyan, and A. R.
Ghazaryan, “On the rate-reliability-distortion and partial secrecy region of a
one-stage branching communication system,” in Proceedings of IEEE Inter-
national Symposium on Information Theory, p. 211, Sorrento, Italy, 2000.
[214] S. Verdú, “Asymptotic error probability of binary hypothesis testing for pois-
son point-process observations,” IEEE Transactions on Information Theory,
vol. 32, no. 1, pp. 113–115, 1986.
[215] S. Verdú, “Guest Editorial,” IEEE Transactions on Information Theory,
vol. 44, no. 6, pp. 2042–2043, 1998.
[216] S. Verdú, Multiuser Detection. Cambridge University Press, 1998.
[217] F. M. J. Willems, Information Theoretical Results for the Discrete Memoryless
Multiple Access Channel. PhD thesis, Katholieke University Leuven, 1982.
[218] F. M. J. Willems, “The maximal-error and average-error capacity region of
the broadcast channel are identical,” Problems of Control and Information
Theory, vol. 19, no. 4, pp. 339–347, 1990.
References 171
[219] F. M. J. Willems and E. C. van der Meulen, “The discrete memoryless

multiple-access channel with cribbing encoders,” IEEE Transactions on Infor-
mation Theory, vol. 31, no. 3, pp. 313–327, 1985.
[220] J. Wolf, A. D. Wyner, and J. Ziv, “Source coding for multiple description,”
Bell System Technical Journal, vol. 59, no. 8, pp. 1417–1426, 1980.
[221] J. Wolfowitz, “Simultaneous channels,” Archive for Rational Mechanics and
Analysis, vol. 4, no. 4, pp. 371–386, 1960.
[222] J. Wolfowitz, Coding Theorems of Information Theory. Berlin-Heidelberg:
Springer Verlag, 3rd ed., 1978.
[223] H. Yamamoto, “Source coding theory for cascade and branching communi-
cation systems,” IEEE Transactions on Information Theory, vol. 27, no. 3,
pp. 299–308, 1981.
[224] R. Yeung, A First Course in Information Theory. New York: Kluwer Aca-
demic, 2002.
[225] O. Zeitouni and M. Gutman, “On universal hypotheses testing via large devia-
tions,” IEEE Transactions on Information Theory, vol. 37, no. 2, pp. 285–290,
1991.
[226] Z. Zhang and T. Berger, “New results in binary multiple description,” IEEE
[227] Z. Zhang and T. Berger, “Estimation via compressed information,” IEEE
[228] Z. Zhang, T. Berger, and J. P. M. Schalkwijk, “New outer bounds to capac-
ity regions of two-way channels,” IEEE Transactions on Information Theory,
vol. 32, no. 3, pp. 383–386, 1986.
[229] J. Ziv, “On classification with empirically observed statistics and universal
data compression,” IEEE Transactions on Information Theory, vol. 34, no. 2,
pp. 278–286, 1988.
[230] G. Zoutendijk, Methods of Feasible Directions. A Study in Linear and Non-
Linear Programming. Amsterdam: Elsevier, 1960.

(Evgueni A. Haroutunian, Mariam E. Haroutunian, As

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(Evgueni A. Haroutunian, Mariam E. Haroutunian, As

Uploaded by

Copyright:

Available Formats

Reliability Criteria in

Published, sold and distributed by:

Outside North America:

The preferred citation for this publication is E. A. Haroutunian, M. E. Haroutunian

Venkat Anantharam (UC. Berkeley) Amos Lapidoth (ETH Zurich)

Foundations and Trends R

• Coded modulation • Multiuser detection

Information for Librarians

Reliability Criteria in Information Theory

To the memory of Roland Dobrushin the outstanding scientist and

This monograph is devoted to one of the central problems of Infor-

The comments and suggestions of S. Shamai (Shitz), G. Kramer,

1.1 Information Theory and Problems of Shannon Theory 1

2 E-capacity of the Discrete Memoryless Channel 9

4 E-capacity of Varying Channels 57

5 Source Coding Rates Subject to Fidelity

6 Reliability Criterion in Multiterminal

6.1 Robust Descriptions System 103

7 Logarithmically Asymptotically Optimal Testing

Basic Notations and Abbreviations 153

1.1 Information Theory and Problems of Shannon Theory

Claude Shannon, 1948

Information Theory as a scientific discipline originated from the

provided the concepts, insights, and mathematical formulations that

1.2 Concepts of Reliability Function

Shannon [195], as the optimal exponent of the exponential decrease

of finding the rate-reliability-distortion region of a multiterminal sys-

1.3 Notations for Measures of Information

The set of messages to be transmitted are denoted by M and its

for joint entropy of RV X and Y :

for conditional entropy of RV Y relative to RV X:

for mutual information of RV X and Y :

for conditional mutual information of RV X and Y relative to

for informational divergence of PD P and PD Q on X :

The following identities are often useful

D(P ◦ V kQ ◦ W ) = D(P kQ) + D(V kW |P ),

1.4 Basics of the Method of Types

is called V -shell of x. The set of all possible V -shells for x of type P is

Lemma 1.1. (Type counting)

|PN (X )| < (N + 1)|X | , (1.1)

Lemma 1.2. For any type P ∈ PN (X )

(N + 1)−|X | exp{N HP (X)} < |TPN (X)| ≤ exp{N HP (X)}, (1.3)

and for any conditional type V and x ∈ TPN (X)

(N + 1)−|X ||Y| exp{N HP,V (Y |X)}

Lemma 1.3. If x ∈ TPN (X), y ∈ TP,V

QN (x) = exp{−N (HP (X) + D(P kQ)), (1.5)

Some authors frequently apply known facts of the theory of large

2.1 Channel Coding and Error Probability:

Definition 2.1. A discrete memoryless channel (DMC) W with input

An element W (y|x) of the matrix is a conditional probability of receiv-

The model for N actions of the channel W is described by the

Fig. 2.1 Communication system with noisy channel.

an element of which W N (y|x) is a conditional probability of receiving

Definition 2.2. An N -block code (f, g) for the channel W is a pair of

Definition 2.3. The probability of erroneous transmission of the mes-

as the minimum average probability between all possible codes of the

Definition 2.4. The transmission rate of a code (f, g) of the length

The channel coding problem is the following: it is necessary to make

It is possible to characterize each channel W by a number C(W ),

Shannon introduced the notion of mutual information and discov-

C(W ) = max IP,W (X ∧ Y ), (2.6)

where P = {P (x), x ∈ X } is the PD of input symbols x.

Definition 2.5. The reliability function E(R, W ) of the channel W is