Rateless Coding With Partial State Information at The Decoder

1
Rateless coding with partial state information at the decoder

Anand D. Sarwate Student Member, IEEE, and Michael Gastpar Member, IEEE
Abstract The problem of coding for channels with time-varying state is studied. Two different models are considered: one in which the channel state cannot depend on the transmitted signal, and the other in which it can. When randomized coding using a secret key is permitted, schemes are developed that achieve the point-to-point randomized coding capacity of these channels for a range of key size and error decay tradeoffs. These schemes are based on derandomizing fully random codes via sampling and randomizing list codes via message authentication. These constructions are further generalized to rateless code constructions where the decoder is given partial information about the empirical channel. Bounds on the tightness of the achieved rates are derived in terms of the quality of the side information.
I. I NTRODUCTION In his 1972 paper on broadcast channels, Thomas Cover compared certain broadcasting problems to giving a lecture to a group of disparate backgrounds and aptitudes. Suppose that at a certain progressive university, a professor embarked on a controversial new lecturing technique for her morning class. Each students alertness would wax and wane during the course of the lecture, due to such factors as the amount they had slept, their activities the previous night, or whether they had drunk any coffee. Since the professor could not tell how fast the students would learn, she decided that they should be the best judge of their own wakefulness. Each student was given an identical notesheet of xed size. During the lecture, she would attempt to teach the students a xed amount of information. The students were to take notes as they understood the material; once their sheet was full, they could leave. The professor would look up at the class every minute to see if anyone was left, and the lecture would end only when every student had voluntarily left the room. The professors goal was to design her lecture to allow each student to spend just as much time in class as was necessary for him or her to learn the material. Consider rst the problem of lecturing to a single student for a xed length of time. A simple informationtheoretic model might model the lecture as encoding the message into a sequence of facts (the codeword) which is then conveyed via a memoryless channel W (y|x) to the student. This does not quite capture the situation, since the students attention may vary over time. We can model this via a channel W (y|x, s) with state s S and partial information about the state sequence available to the decoder. There are now at least two different models we can pursue, depending on whether or not we assume the students ignorance is independent of the facts. In the former case, we can model the channel as a standard arbitrarily varying channel (AVC), and in the latter as an AVC with input-dependent state. Our professors lecturing strategy can be thought of as that of designing a rateless code for an arbitrarily varying channel with partial state information at the decoder. Again, we can consider two cases depending on whether we allow the channel state to depend on the transmitted codeword. A single transmitter (the professor) wishes to communicate a common message (the information) to a group of receivers (the students) over channels with varying states (the complicated factors). The channels are unknown to the transmitter, but partially known to the decoder (the students measure of their own wakefulness). The transmitter and receivers share a secret key (the notesheet) that is independent of the message. The transmitter encodes the message into a codeword (the lecture) such that that receivers can decode (leave the room) at a rate compatible with their state information.
Manuscript received November XX, 2007; revised XXXXXXXXXXXXXX. Part of this work was presented at the 2007 IEEE Symposium on Information Theory in Nice, France, and at the 2007 IEEE Information Theory Workshop in Tahoe, CA. The authors are with the Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley CA 94720-1770 USA. The work of A.D. Sarwate and M. Gastpar was supported in part by the National Science Foundation under award CCF-0347298.
We model the state as being selected by a malicious jammer who wishes to minimize the rate of communication. The dependence or independence of the channel state on the input characterizes the jammers capabilities and can be captured in our error criterion. In some cases the achievable rates are the same. For example, consider a channel with binary input, output, and state, where the output is the modulo-two sum of the input and state and the empirical frequency of 1s in the state sequence is bounded by . For this AVC it is well-known that randomized codes achieve rates 1 hb () for input-independent [15] and input-dependent [29] state. If instead the output is the real addition of the input and state, then the result changes. If the state is independent of the input and 1/2, the randomized coding capacity is 1/2 [9], [16]. However, if the state can depend on the input, the situation is considerably more grim, since the jammer can elect to expend his budget of 1s only on those places where the transmitter sends a 0. In this case the capacity will be lower. This study is motivated by problems arising in spectrum sharing or cognitive radio systems. In these communication applications, several systems maybe using the same frequency band at the same time. As a result, the interference experienced by one system may be time varying and poorly modeled by stationary noise. Our stronger nosy noise model for channels with input-dependent state can be thought of as a model for packet-tampering in ad-hoc wireless networks. Here the corruption experienced by a packet may act non-causally. Our xed blocklength channel models are conservative and focus on guaranteeing performance in the worst case. The rateless code constructions that follow are one way to ease away from the worst case model while providing protection against strong interference. A. Related work An arbitrarily varying channel is a channel with a time-varying state sequence controlled by a malicious jammer who wishes to minimize the maximum rate of reliable communication between the encoder and decoder. The literature on arbitrarily varying channels is replete with technical distinctions between different error criteria (maximal versus average), allowable coding strategies (deterministic versus randomized), and constraint types (peak versus average). These distinctions have an impact on the capacity formulae. One major open problem is to nd the deterministic coding capacity of AVCs. This problem was rst studied by Kiefer and Wolfowitz [27], Ahlswede and Wolfowitz [8], and Ahlswede [6], with the best results due to Csisz r and K rner [13]. However, the capacity a o for this model is not known in general and nding the capacity for general AVCs would give a solution to the zero-error capacity [2]. Other AVC models can be thought of as relaxations from this problem they weaken the error requirement or make the encoding strategies more powerful. Arbitrarily varying channels were rst studied in the seminal paper of Blackwell, Breiman, and Thomasian [9]. There they considered randomized coding for maximal error, in which the encoder and decoder share an unlimited source of common randomness. They proved that the randomized coding capacity Cr is a max-min of the mutual information between the input and output of the channel. Ericson computed error bounds in terms of the available common randomness [21]. Csisz r and Narayan introduced the idea of placing constraints on the channel input a and state sequences, with an eye to studying Gaussian AVCs [17], [23]. They calculated the randomized coding capacity under maximal error [15] and the deterministic coding capacity under average error [16]. Hughes and Thomas studied the exponents for these cost constrained models [26]. Another relaxation involves changing the denition of the probability from the maximum over all messages to the average over messages. In the noisy channel coding theorem, the probabilistic method is used to choose a random codebook with small average error, which in turn shows that there exists a codebook with small maximal error. Such arguments will not work for AVCs, because we cannot guarantee that a large subset of codewords exists whose error is small under all state sequences. Ahlswedes landmark paper [5] showed that the average error capacity under deterministic coding Cd is 0 or equal to the randomized coding capacity. Later, Csisz r and Narayan a [16] showed that Ericsons symmetrizability [21] was necessary and sufcient to render the capacity 0. They also found the deterministic coding capacity under average error Cd () for cost constrained channels and showed that Cd () may be positive but strictly less than the randomized coding capacity. List decoding is a third approach to making the AVC coding problem easier. List codes for AVCs under maximal error were rst studied by Ahlswede [3], who later showed that rates within O(L1 ) of Cr were achievable under maximal error with lists of size L [7]. For list coding under average error, Hughes [25] and Blinovsky, Narayan, and Pinsker [10] found that the notion of symmetrizability generalizes and showed that the randomized coding
JAM
s i x y i
E NC
W (y|x, s)
D EC
secret key
Fig. 1. Communicating over a channel whose state can depend on the message i and channel input x non-causally. The jammer can choose s to be a function of i and x but does not have access to the common randomness (secret key) shared by the encoder and decoder.
capacity is achievable under average-error using list decoding. Most recently, we have extended these techniques to the case of list decoding with constraints on the state sequences [31]; these results will be used here in our randomized coding strategy. In one model discussed in this paper, the state sequence can depend non-causally on the transmitted codeword, which means that the set of strategies for the jammer has size doubly-exponential in the blocklength. This channel model, shown in Figure 1, has been discussed previously in the AVC literature, where it is sometimes called the A VC. For deterministic coding, knowing the message is the same as knowing the codeword, so the maximal error capacities are identical [14, Problem 2.6.21], and Ahlswede and Wolfowitz showed the average error is the same [8]. The capacity of this channel under noiseless feedback was later found by Ahlswede [4]. To our knowledge, for cost-constrained AVCs the problem was not studied until Langberg [29] found the capacity for bit-ipping channels with randomized coding. Smith [35] has shown a computationally efcient construction using O(n) bits of common randomness. Our construction in this paper generalizes that of Langberg to general cost-constrained AVCs and we show that related construction can also be useful for robust rateless coding. A related but different channel model is considered by Agarwal, Sahai, and Mitter [1]. In their model, the channel is constrained to output a sequence that lies within some distortion of the channel input sequence. For some distortion measures, their channel model coincides with an AVC model. For example, difference distortion measures can be captured by an additive channel with an appropriate cost constraint. By xing the input distribution and allowing a number of shares bits polynomially large in the blocklength n, they show that the capacity of this channel is equal to a rate-distortion function. In our point-to-point model, we do not x the input distribution and we explicitly limit the required key size to a number of bits sublinear in n. Rateless codes are used to communicate over time varying channels when a low-rate feedback link can be used by the decoder to terminate the transmission. Figure 2 shows a diagram of a rateless communication channel. Rateless codes were rst studied in the context of the erasure channel [30], [33] and later discrete memoryless compound channels [19], [34], [36]. Draper, Frey, and Kschischang [18] investigated rateless coding over AVCs for average error with full state information at the decoder. Coding for individual state channels will full feedback was considered by Shayevitz and Feder [32]. Our approach here is to assume only partial state information at the decoder, which results in rates lower than that of [18]. Our coding construction may be used to partially derandomize the construction in [22] by providing the requisite common randomness in the feedback link. B. What is in this paper? In Section IV we prove two results on block coding for arbitrarily varying channels with cost constraints. In Theorem 1 we provide a partially derandomized code construction suggested by Csisz r and Narayan [15] and a Hughes and Thomas [26]. This result is mainly of interest as a point of comparison with the more robust error model. Our main new contribution in this section is to nd the randomized coding capacity when the jammer knows the transmitted codeword. Theorem 2 gives a formula for the capacity of this channel and we provide an achievable coding scheme based on list codes for AVCs. Our code construction furthermore gives explicit tradeoffs between the rate, amount of common randomness, and error decay. In Sections V and VI we turn to the problem of rateless coding over cost-constrained AVCs with partial channel state information (CSI) at the decoder. For the regular cost-constrained AVC, we construct a rateless code based
s i x
CSI
E NC key
W (y|x, s) c
D EC key
Fig. 2. Rateless communication system. The encoder and decoder share a source of common randomness. A single bit of feedback is available every c channel uses for the decoder to terminate transmission. Some partial information about the channel state is available at the decoder every c channel uses in a causal fashion.
on an expurgation argument. For the case where the state sequence may depend on the transmitted codeword, we introduce a new model of partial CSI in which the decoder is given a set of channels in which the true empirical channel lies. By concatenating list-decodable codes, we nd a more opportunistic decoding strategy whose only negligible loss from the true empirical capacity is from the channel estimation. Our decoder for channels with input-dependent state has two stages. The rst is list decoding in which the decoder outputs a short list of candidate codewords, as opposed to the exponential-size list codes used in relay channels [12, Section 14.7]. We guarantee that the transmitted codeword is on the list with high probability. The second component of our code is a message authentication step, in which the list created by the decoder is compared against the secret key. The codeword that is actually transmitted is chosen using the message and a small amount of common randomness, which we will call the secret key shared by the encoder and decoder. This key is independent of the message, so each key corresponds to a subset of codewords in our list-decodable code. The crucial property of the authentication step is that with high probability, there is only one codeword on the list that is consistent with the key. This construction was rst proposed by Langberg [29] for a binary additive channel, and we use it to prove a more general result on discrete AVCs with nite state sets. II. C HANNEL
MODEL DEFINITIONS
For an integer M we will let [M ] = {1, 2, . . . , M }. The notation xn is the set of elements (xm , . . . , xn ). The set m P(X ) is the set of all probability distributions on a set X , and the set Pn (X ) is the set of all types of denominator n on X . Let Tx be the type of a sequence x. Let dmax (P, Q) be the maximum deviation ( distance) between two probability distributions P and Q:
dmax (P, Q) = max |P (x) Q(x)| .
xX The set TV (x) is the (V, )-shell around x: TV (x) =
(1)
y : dmax Ty ,
x
Tx (x)V (y|x)
<
(2)
A. Channel model We will model our time-varying channel by an arbitrarily varying channel, which is a set W = {W (|, s) : s S} of channels from an input alphabet X to an output alphabet Y parameterized by a state s S . We will assume the sets X , Y and S are nite. If x = (x1 , x2 , . . . , xn ), y = (y1 , y2 , . . . , yn ) and s = (s1 , s2 , . . . , sn ) are length n vectors, the probability of y given x and s is given by:
n
W (y|x, s) =
i=1
W (yi |xi , si ) .
(3)
The interpretation of (3) is that the channel state can change arbitrarily from time to time. We will think of this as an adversarial model in which the state is controlled by a jammer who wishes to stymie the communication
between the encoder and decoder. As we will see, the capabilities of this adversary can be captured in the error criterion. One extension of this model is to introduce constraints on the input and state sequences [15]. We will focus only on the latter in this paper. A cost function is a function l : S R+ , and we will assume maxsS l(s) = < . The cost of a vector s = (s1 , s2 , . . . , sn ) is the sum of the cost on the elements:
n
l(s) =
i=1
l(si ) .
(4)
In point-to-point xed blocklength channel coding problems, we assume a constraint on the average state cost, so that
l(s) n a.s. .
(5)
If then we say the AVC is unconstrained. Let S n () = {s : l(s) n} be the set of sequences with average cost less than or equal to . B. Coding denitions and error models In the AVC literature, the form of the capacity formula is generally determined by two factors: the allowable coding strategies and the error criterion. In a randomized code, the encoder and decoder share a source of common randomness with which they may randomize their coding strategy, whereas a deterministic code uses a xed mapping from messages to codewords. The state sequence may depend on different quantities the message, the transmitted codeword, or both. Furthermore, we may relax the denition of correct decoding to allow the decoder to output a list of candidate codewords. All of these changes affect the error criterion. An (n, N, K) randomized code for the AVC is a family of maps {(k , k ) : k [K]} indexed by a set of K keys. The encoding maps are functions k : [N ] X n and the decoding maps are k : Y n [N ]. The rate of the code is R = n1 log N . The decoding region for message i under key k is Di,k = {y : k (y) = i}. The power of randomized codes comes from modifying the denition of the error probability. Rather than demanding that the decoder error be small for every message and every key value, we instead require it to be small for every message on average over key values. Here we assume the key is chosen uniformly in the set [K]. For standard AVC in which the jammer does not know the codeword, the maximum probability of error is given by
r = max
i
sS n ()
max
1 K
K k=1
(1 W n (Di,k |k (i), s)) .
(6)
We will say a sequence of pairs (R(n), K(n)) is achievable under maximal error if there exists a sequence of (n, 2nR(n) , K(n)) randomized codes whose error r 0 as n . We refer to the case when the jammer knows both the message i and the transmitted codeword k (i) as a channel with nosy noise. Let J () = {J : [N ] X n S n ()}. In this scenario we dene the maximum probability of error by
r = max max
i
1 JJ () K
K k=1
(1 W n (Di,k |k (i), J(i, k (i)))) .
(7)
We will say a sequence of pairs (R(n), K(n)) is achievable under maximal error with input-dependent state if there exists a sequence of (n, 2nR(n) , K(n)) randomized codes whose error r 0 as n . An (n, N, L) deterministic list code for the AVC is a pair of maps (, ) where the encoding function is : [N ] X n and the decoding function is : Y n [N ]L . The rate of the code is R = n1 log(N/L). The codebook is the set of vectors {xi : i [N ]}, where xi = (i). The decoding region for message i is Di = {y : i (y)}. We will often specify a code by the pairs {(xi , Di ) : i [N ]}, with the encoder and decoder implicitly dened. For list codes we can also dene the maximum probability of error:
L = max max (1 W n (Di |(i), s)) .
i s
(8)
Because the decoding regions Di can overlap for list codes, the error probability can be small without randomization for every message i. A sequence (R(n), L(n)) is achievable if there exists a sequence of (n, L(n)2nR(n) , L(n)) list codes whose error L 0 as n . C. Information quantities For a given state sequence s we can compute the average empirical channel
Ws (y|x) = 1 n
n
W (y|x, si ) .
i=1
(9)
The set of empirical channels will be denoted by Wn = {Ws (y|x) : s S n }. For a xed input distribution P (x) on X and channel W (y|x), the mutual information is given by the usual denition:
I (P, W ) =
x,y
W (y|x)P (x) log
W (y|x)P (x) . P (x) x W (y|x )P (x )
We will also dene the following sets:

Q() = U(P, ) = Q P(S) : Q(s)l(s) U (s|x)P (x)l(s) .
(10) (11)
U P(S|X ) :
s,x
For an AVC W = {W (Y |X, S) : s S} with state constraint we can dene two sets of channels W() by W() = Wdep (P, ) = V (Y |X) : V (y|x) = V (Y |X) : V (y|x) =
s
W (y|x, s)Q(s), Q(s) Q() W (y|x, s)U (s|x), U (s|x) U(P, ) .
(12) (13)
We will suppress the explicit dependence on . The set in (12) is called the convex closure of W , and the set in (13) is the row-convex closure of W . In earlier works Wdep (P, ) is sometimes written as W . We will denote by I (P, V ) the mutual information I (X Y ) with input distribution P (x) and channel V (y|x). We will also dene
I (P, V) = min I (P, V ) .
V V
(14)
This is the minimum mutual information of a channel with xed input distribution P and channel V V . We can dene the randomized coding capacities of these AVCs by:
Cr () = max I P, W() Cdep () = max I P, Wdep (P, ) .
P P(X ) P P(X )
(15) (16)
With some abuse of notation, we will dene I (P, ) = I P, W() . The capacity result (15) was shown by Blackwell, Breiman, and Thomasian for unconstrained AVCs [9] and Csisz r and K rner [15] for constrained a o AVCs. Our Theorem 1 proves the observation made by Csisz r and Narayan [15] and Hughes and Thomas [26] a that the randomized coding capacity is achievable with limited common randomness. In Theorem 2 we prove that the under the error model in (7) the randomized coding capacity is given by (16) and show that this capacity is also achievable with limited common randomness. For the standard AVC model, our error bounds will be stated in terms of the random coding error exponent for these channels. The exponent as a function of the rate R, AVC W , input distribution P , and iid state distribution Q was found by Hughes and Thomas [26]: Er (R, W, P, Q) =
PXY S :PX =P,PS =Q
min
D PXY S
W P Q + I(P, PY |X ) R
(17)
where
(W P Q)(y, x, s) = W (y|x, s)P (x)Q(s) ,
(18)
PXY S is a joint distribution on X Y S with marginals and conditional distributions PX , PS , PY |X , and D ( ) is the Kullback-Leibler divergence. We will suppress the dependence on W and write Er (R, P, ) = minQQ() Er (R, W, P, Q).
D. Rateless coding and partial channel state information A rateless code is variable-length coding strategy that uses periodic single-bit active feedback from the decoder to terminate the decoding. In our problem formulation, the decoder is periodically given side information consisting of an estimate of the channel. We call the period at which the feedback and side information is made available the chunk size c. We will denote the partial side information available to the decoder after chunk m by Vm , which mc takes values in a set V(c). We will dene y(mc) = y(m1)c+1 , and similarly for x(mc) and s(mc) . A (c, N, K) rateless code is set of maps {(m , m , m ) : m = 1, 2, . . .}:
m : [N ] [K] X c m : Y
mc m mc m
(19) (20) (21)
m : Y
V(c) [K] [N ] .
V(c) [K] {0, 1}
To encode chunk m, the encoder maps the message in [N ] and key [K] into a vector of c channel inputs. The decoder uses a decision function m to decide whether to feed back a 1 to tell the encoder to terminate transmission or a 0 to continue. When m () = 1, then the decoder uses the decoding function m to decode the message. The decision function m denes a random variable, called the decoding time M of the rateless code:
m M = min {m : m (Y1mc , V1 , k) = 1} .
(22)
Let M = {M , M + 1, . . . , M } be the smallest interval containing the support of M. The set of possible rates for the rateless code are given by {(mc)1 log N : m M}. Let
Cmax = max I (P, W (|, s)) ,
sS
(23)
be the maximum rate attainable on the AVC W . For the codes studied in this paper,
M = log N Cmax c .
(24)
and M = n/c. The maximal error and maximal error with input-dependent state for a (c, N, K) rateless code at decoding time M = M are, respectively,
1 r (M, s) = max i[N ] K r (M, J) = max 1 i[N ] K
K
W
k=1 K
M Y1M c : M = M, M (Y1M c , V1 , k) = i M Y1M c : M = M, M (Y1M c , V1 , k) = i
M (i, k), sM c 1 1 M (i, k), JM (i, M (i, k)) 1 1 .
(25)
W
k=1
(26)
Here JM : [N ] X M c S M c . If the state sequence does not depend on the transmitted codeword, then we may assume that the state sequence is xed rst and the random key k is chosen afterwards. We will model the CSI after the m-th chunk as a measurement m R+ with the property that
1 c
mc t=(m1)c+1
l(st ) m .
(27)
That is, the decoder obtains a bound on the average state cost over the chunk. The set of possible values of the average state cost is at most (c + 1)|S| , which is the number of types on S . We will set V(c) = {W() : = s Q(s)l(s), Q Pn (S)} to be the possible values for the average state cost in a chunk. Let us now turn to the case where the channel state can depend on the transmitted codeword. We will model the side information after chunk m as a subset Vm Wdep with the property that
1 c
cm i=c(m1)+1
W (y|x, si )1(xi = x) Vm .
(28)
That is, the decoder is given an estimate of the empirical average channel during the m-th chunk. We can translate many kinds of information about the empirical channel into a set Vm by nding the set of channels in U(P ) that are consistent with the observations and forming a set Vm Wdep (P ). Some examples are given in the next section. In order for our results to hold we will need a polynomial upper bound on the size of V(c). We will assume that |V(c)| cv for some v < . Note that in our standard AVC model the polynomial bound is trivial. In our rateless code constructions we are particularly interested in the case where the chunk size c is sublinear in n, the number of messages N is exponential in n, and the key size K is subexponential in n:
c(n) 0 n
(29) (30) (31)
1 log N (n) n 1 log K(n) 0 . n
III. M AIN
RESULTS , TECHNIQUES , AND EXAMPLES
We prove results for four related coding problems in this paper. For point-to-point channels, we obtain capacity results for AVCs with state constraints using limited common randomness and establish tradeoffs between the key size and error decay. A. Main results As noted by Ericson [21], Csisz r and Narayan [15], Hughes and Thomas [26], the rst step of the elimination a technique [5] can be used to reduce the randomization needed for a randomized code. The capacity under this model is the randomized coding capacity Cr () of the AVC. Theorem 1: Let W = {W (y|x, s) : s S} be an AVC with cost function l() and cost-constraint . For any > 0, there exists a constant such that for n sufciently large the sequence of rate-key pairs (R(n), K(n)) is achievable, where the key size K(n) and the rate and error satisfy
n(R(n) + log |S|) r (n) > , K(n) (log r (n) + ne(Er (R, P , ) )) R(n) = Cr ()
(32) (33)
where Er (R, P , ) is the random coding error exponent for the AVC W , P is the input distribution maximizing (15), and Cr () is the randomized coding capacity of the AVC. We leave the error bound in its current form to provide a exible statement of the results. Suppose K(n) = n . Then the condition (33) becomes:
r (n) > n Cr () + log |S| . (n1 log r (n) + e(Er (R, P , ) ))
(34)
Then we can choose r (n) = O(n1 ). Suppose K(n) = exp(n). Then the condition (33) becomes:
r (n) > exp(n)
Cr () + log |S| 1 log (n) + e(E (R, P , ) (n r r
Then we can choose r (n) = O(exp( n)), where < .
))
(35)
Suppose K(n) = exp(n ) for (0, 1). Then the condition (33) becomes:
r (n) > exp(n )
Then we can choose r (n) = O(exp( n )), where < . For AVCs with nosy noise, the state can depend on the transmitted codeword. We show that the capacity Cdep () is in general smaller than Cr (). By using the list-decodable codes for cost constrained AVCs from our earlier work [31] and combining it with a message authentication scheme used by Langberg [29], we can construct randomized codes for this channel with limited common randomness. Theorem 2: Let W = {W (y|x, s) : s S} be an AVC with cost function l() and cost-constraint . For any > 0, there exists an n sufciently large such that the sequence of rate-key size pairs (R(n), K(n)) is achievable with error r (n), where
R(n) = Cdep () r (n) exp(nE()) +
Cr () + log |S| . (n1 log r (n) + e(Er (R, P , ) ))
(36)
(37)
8nCdep () log |Y| K(n) log K(n) .
(38)
Here Cdep () is given by (16) and is the randomized coding capacity of the AVC with nosy noise. As in the standard AVC, we can evaluate different error tradeoffs here as well: Suppose K(n) = n . Then the condition (38) becomes:
r (n) exp(nE()) +
8nCdep () log |Y| . n/2 log n
(39)
Therefore r (n) = O(n1/2 ). Suppose K(n) = exp(n). Then the condition (38) becomes:
r (n) exp(nE()) + 8nCdep () log |Y| . n exp(n/2)
(40)
Therefore r (n) = O(exp(n min(E(), /2))). Suppose K(n) = exp(n ) for (0, 1). Then the condition (38) becomes:
r (n) exp(nE()) +
8nCdep () log |Y| . n exp(n /2)
(41)
Therefore r (n) = O(exp(n )). For rateless codes, we construct strategies that use concatenated xed-composition codebooks over both standard AVCs and AVCs with nosy noise. For the standard AVC model, we can use the construction of Csisz r and Narayan a [15] as a basis for constructing a randomized rateless code with unbounded key size. By using the elimination technique to partially derandomize this construction we can reduce the key size and establish a tradeoff between the randomization and error. Theorem 3: For any > 0 and input type P P(X ), for n sufciently large there is an (c(n), N (n), K(n)) rateless code for which, given
l(s(mc) ) cm .
(42)
the decoding time M is given by

M = min M : 1 log N I Mc P, W 1 M
M
m
m=1
2 (M )
(43)
For this code, M = n/c and 2 (M ) = (M /M )2. For for all (s, M ), the error of this code satises 1 + Mc log |S| . (44) K (log r (M, s) 1 + Mc (Er (M + 2 (M, c), P, M ) 2)) Again, for this model we must choose concrete K(n) and r (n) scalings to evaluate the key-error tradeoffs as in (34)(36). Indeed, the scalings can be chosen to match those for the point-to-point case. r (M, s) >
10
For the more robust model, where the state can depend on the transmitted codeword, we can also use a codebook of concatenated constant-composition components1. Theorem 4: For any 2 > 0 and input type P P(X ), for n sufciently large, there is an (c(n), N (n), K(n)) rateless code for which, given that
1 c
mc t=(m1)c+1
W (y|x, si )1(xi = x) Vm ,
(45)
the decoding time M is given by

M = min M :
M For all (s, V1 ), error for this code satises
1 1 log N I (P, Vm ) 2 Mc M
(46)
14n log |Y| . (47) 2 K log K Again, we can choose scalings as in (39)(41). In the nal case, the error decay is bounded by the rst term, which is exponential in the chunk size c alone. One question that arises in the analysis of rateless codes is their efciency how well they perform against an optimal decoder. There are two slightly different questions: does the code decode as early as possible? What is loss in rate compared to the empirical rate at the time of decoding? Both questions hinge on the quality of the side information available to the decoder. The rst is answered by the denition of M the decoding time is the earliest possible given the gaps 2 and 2 . We address the second question in the corollaries following the proofs of Theorems 3 and 4 in Sections V and VI and bound the loss in rate in terms of the quality of the channel estimates. r (M, s) M exp(cE(2 )) +
B. Techniques In this paper we use two derandomization strategies in our code constructions. The rst method, pioneered by Ahlswede [5] under the name the elimination technique, is to start with a randomized code that has good properties and then sample from this code to obtain a smaller ensemble of codes that still has good properties. The following lemma gives this result in a form that will be convenient for us in the sequel. A proof is included for completeness. Lemma 1: Let C be an (n, N, J) randomized code with N = exp(nR) whose expected maximal error satises
E[(Ck , s)] = 1 W n (Di,k |k (i), s) (n) ,
(48) (49)
for an AVC W with cost function l() and cost constraint . Then for all satisfying: n > (R + log |S|) . log e K we can nd a (n, N, K) randomized code with probability of error less than .
1
(50)
Try saying that 20 times quickly!
11
x3 , A11 x81 , A21 x4 , A31
message x5 , A12 x16 , A13 x9 , A22 x22 , A23 x63 , A32 x2 , A33
x1 , AK1
x7 , AK2
x7 , AK3
Fig. 3. Constructing a randomized code from a list-decodable code. We put the codewords of the list code into a K N/ K table. Each column has a partition of the set of K keys into sets Aij of K keys each. The intersection of the key sets is small.
Proof: Let C1 , C2 , . . . , CK be K codebooks drawn according to the random variable C. Then we can bound:
P 1 K
K K k=1
(Ck , s)
= P exp r
k=1
(Ck , s)
exp(Kr)
K
exp(Kr)E [exp(r(Ck , s))]K = exp(Kr)E 1 +

m=1
1 m r (Ck , s)m m!
exp(Kr) 1 + (n)
exp(Kr) (1 + (n)e )
m=1 r K
1 m r m!
exp (K(r (n)er )) .
= exp (K(r log(1 + (n)er )))
(51)
Taking a union bound over all s and N = exp(nR) we obtain:

P 1 K
K k=1
(Ck , s) , s
exp (K(r (n)er ) + n(log |S| + R)) .
(52)
The maximal error of our new randomized code will be small as long as K(rt er ) grows faster than n. We can optimize over r to nd r = log(/) so we get the relation > n(log |S| + R) . (53) K log e A second derandomization strategy is to use a list code of small (constant) list size together with a type of cryptographic message authentication system. This construction has been used by Langberg [29] and Smith [35] to construct randomized codes for constrained bit-ipping AVCs in which the codeword is known to the jammer. By using our new list codes we can construct such randomized codes for general AVCs. We will briey describe the scheme and use our new results on list codes to characterize the error probability with the key size and target rate. Lemma 2 (Message Authentication [29]): Let CL be an (n, N, L) deterministic list code of rate R = n1 log N and probability of error . For key size K(n) there exists an (n, N/ K(n), K(n)) randomized code with probability of error + , where 2LnR = . (54) K log K Proof: Let R = n1 log(N/ K). Let i and z be elements of GF ( K), and let the key be given by the pair (i, z). Pick {fj () : j = 1, 2, . . . , 2nR } to be a set of 2nR distinct monic polynomials of degree d 1 over GF ( K). Then let Aij = {(i, zfj (z) + i) : z = 1, 2, . . . , K}.
12
Since i acts as a constant shift of the polynomial zfj (z), it is clear that {Aij : i = 1, 2, . . . , K} is a partition of the set of all keys. Furthermore, for j = j we have |Aij Aij | d, since fj (z) = fj (z) for at most d values of z . We now construct a table as shown in Figure 3. The columns index the N/ K messages for the randomized code, and the rows the K possible values of i. The N codewords {xl } of the list code CL are arbitrarily placed in the table. We also associate Aij with the (i, j)-th cell in the table. The encoder takes a message j and key (i, z) and outputs the codeword of the list code in the (i, j)-th position in the table. Note that knowledge of the transmitted codeword tells the jammer both the message j and set Aij in which the key must lie. The decoder for the randomized code rst decodes using the list code CL to nd a list of at most L candidate codewords {xl1 , xl2 , . . . , xlL }. Each of these codewords has an associated key set {Aij (l1 ), . . . , Aij (lL )} given by the table. The decoder chooses the unique lk for which (i, z) Aij (lk ) (if it exists) and outputs the corresponding message jk . If no such lk exists then we declare an error. There are two possible decoding errors. If the list code has a decoding error then the correct codeword will not be in the list and so the decoder for the randomized code will fail. This happens with probability smaller that by the assumptions on the list code. If the transmitted codeword is in the list produced by the list decoder, then we will have an error if there is not a unique lk for which (i, z) Aij (lk ). That is, we must have (i, z) Aijk for some k = k. We know |Aij Aij | d, so there are at most Ld values of (i, z) for which this can happen. Since the jammer knows i and there are K values for z , the probability that the key cannot disambiguate the list is at most = Ld/ K . The total error probability is then bounded by + . d1 The last part is to choose d appropriately. There are K monic polynomials of degree d 1 over GF ( K), so we need d1 1 K 2nR . (55) K This in turn implies d nR . log K
(56)
Substituting this into the expression for in the previous paragraph we obtain (54). C. Examples A real additive channel. As we discussed in the introduction, the capacity formula may be different depending on whether the state can depend on the input. For our rst example, consider the real adder channel with X = S = {0, 1}, Y = {0, 1, 2}, and y = x + s. The randomized coding capacity under the standard AVC model without constraints is 1/2 [9]. If we choose l(s) = s and pick the state constraint , then we can calculate I P, W() for each . For > 1/2 the capacity is also 1/2, and for smaller there is an improvement in rate, as shown in Figure 4. In the case where the state can depend on the input, the situation is less rosy for an input distribution P = (1 p, p), if 1 p the jammer can change every 0 in the transmitted codeword into a 1, making the output entropy 0 and hence zeroing the capacity. For smaller p the randomized coding capacity may be larger, as shown in Figure 5. Another feature of this channel is that the mutual information does not have a saddle point independent of , so the capacity-achieving input distribution will shift as a function of . Finally, note that for this channel, Cdep () is much smaller than Cr (). BSC mixed with Z-channels. Consider a channel with binary inputs and binary outputs and three states S = {0, 1, 2}. For S = 0 the channel is a binary symmetric channel with crossover probability a, for S = 1 it is a Z-channel with crossover b:
W (y|x, 1) = 1 0 b 1b 1c c 0 1 ,
(57)
and for S = 2 it is an S-channel with crossover c:

W (y|x, 1) =
(58)
13
I(P,W) versus input distribution P (bits)

0.7 = 0.3 = 0.5 = 0.7
0.6
0.5
0.4
I(P,W)
0.3 0.2 0.1 0 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Input distribution p = P(X = 1)
` Fig. 4. I P, W() for = 0.3, 0.5, 0.7. The unconstrained capacity is 1/2 and corresponds to p = 1/2. For 1/2 we also have Cr () = 1/2.
I(P,W
0.35
dep
) versus input distribution P (bits)

= 0.3 = 0.5 = 0.7
0.3
0.25
I(P,W
dep
0.2
0.15
0.1
0.05
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
` Fig. 5. I P, Wdep () for = 0.3, 0.5, 0.7. For large p the capacity is 0 and as increases this zero region grows. In addition, as changes the capacity achieving input distribution changes.
14
I(P,Wdep) versus P for BSC/Z example (bits)

0.7 = 0.1 = 0.2 = 0.4
0.6
0.5
I(P,W
dep
0.4
0.3
0.2
0.1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Fig. 6.
` I P, Wdep () for = 0.1, 0.2, 0.4 with a = 0.01, b = 0.1, and c = 0.15, and l(1) = 0.5 and l(2) = 0.6.
The S-channel maps 1 to 1 with probability 1 and 0 to 1 with probability c. We will set l(0) = 0 and let l(1) and l(2) be arbitrary. This approach to modeling a channel assumes that the overall channel is a mixture of channels with known characteristics. Figure 6 shows a plot of I P, Wdep () versus P for this example. The plot shows that for different values of the capacity achieving distribution can shift signicantly when the state can depend on the input. IV. C ODING
FOR POINT- TO - POINT LINKS
In this section we discuss the problem of coding over channels given by (3) in which a cost constraint of the form (5) is known to the encoder and decoder. Our interest is in limiting the size of the secret key to be shared between the encoder and decoder. Our results show that a key of size log K = O(log n) bits is sufcient to achieve the randomized coding capacity for point-to-point AVCs with cost constraints under the standard and and input-dependent-state error criteria. The former result was observed by Csisz r and Narayan [16] and the proof is a included here for completeness. For the more stringent error model we extend the result of [29] on bit-ipping channels to general cost-constrained AVCs in which the codeword is known to the jammer. In view of Lemma 2, we need to nd good list-decodable codes, which output a list of candidate codewords. This relaxation of the coding requirements makes it easier to show the existence of good codes, and we cite the relevant theorem from our earlier work [31] to obtain such a list code and then apply Lemma 2. A. Proof of Theorem 1 For the standard cost-constrained AVC, we state for comparison a result noted by Csisz r and Narayan [15] and a Hughes and Thomas [26]. The proof follows by applying the rst part of Ahlswedes elimination technique [5] in the same manner as Ericson [21] to obtain a series of tradeoffs between the key size and error decay. Proof: Fix > 0. Hughes and Thomas [26, Theorem 6] show that for n sufciently large and N = exp(n(Cr () )), there exists an (n, N ) code C of constant type P such that the (n, N, n!) randomized code C constructed by concatenating C with a permutation k selected uniformly from the set n of all permutations of length n has maximal error
(C) exp (n (Er (R, P , ) )) ,
(59)
15
where R = n1 log N , Er is a random coding exponent and Q is the empirical type of s. Thus we can apply Lemma 1 to this code to show that there exists a set of K permutations {1 , . . . , K } such that {k C : k [K]} is a codebook with error r (n) satisfying n r (n) (log r (n) + ne(Er (R, P , ) )) > (R + log |S|) . (60) K(n)
B. Proof of Theorem 2 The arbitrarily varying channel with deterministic codes and maximal error is directly related to the design of error correcting codes. Because this is a difcult problem, we can instead consider list decoding, a relaxation in which the decoder is allowed to output a small list and we need only guarantee the transmitted message is in the list. We cite the following result from our earlier work [31]. Theorem 5 (List decoding for maximal error): Let W = {W (|, s) : s S} be an arbitrarily varying channel with constraint function l(s) and state constraint . For any > 0 and rate R = Cdep () , there is an n sufciently large and an (n, exp(nR), L) list code with
L< 4 log |Y| +1 ,
(61)
and probability of error

L exp(nE()) .
(62)
The result is proved in two steps rst we claim that list codes of exponential list size exist, and then we construct a code of nite list size by sampling codewords from the larger list code. This line of argument follows that developed by Ahlswede [4], [7]. Proof: (Proof of Theorem 2). We can use the previous lemma with our result on list codes to achieve the desired tradeoff. Let L (n) = Cdep () RL , where RL is the rate for a list code of list size L. For the randomized code construction in Lemma 2 and the bound on the rate loss from Theorem 5, the extra error probability is
8nCdep () log |Y| 2LnR < . K log K L (n) K log K
(63)
Thus for large enough n we can bound the error by exp(nE()) + . Finally, we note that the jammer can choose a memoryless strategy U (s|x) U(P, ). Choosing the worst U yields a discrete memoryless channel whose capacity is Cdep , and hence Cdep is indeed the randomized coding capacity of this channel. This theorem gives some tradeoffs between error decay, key size, and rate loss. We could equally well phrase the result by xing K(n) rst and nding the corresponding expressions. V. R ATELESS
CODING FOR STANDARD
AVC S
WITH COST INFORMATION
In this section we construct a rateless coding scheme for the standard AVC model with cost constraints that uses cost estimates at the decoder to opportunistically decode when it has received enough channel symbols. A. Coding strategy Our coding strategy is nearly identical to the strategy in the next section, except that we restrict ourselves to the case of state cost information at the receiver. Because we consider the standard AVC model, we assume that the state sequence cannot depend on the actual channel inputs. Our scheme uses a xed maximum blocklength n and we will express other parameters as functions of n. Our coding strategy may be described as follows: Algorithm I : Rateless coding for standard AVCs 1) The encoder and decoder agree on a key k [K] to use for their transmission. The encoder chooses a message i [N ] to transmit and maps it into a codeword x(i, k) X n .
16
2) In channel uses (m 1)c + 1, (m 1)c + 2, . . . , mc, the encoder transmits x(mc) . 3) The decoder receives channel outputs y(mc) and an estimate m of the state cost in the m-th chunk such that
1 c
mc t=(m1)c+1
l(st ) j .
m
(64)
Let
1 m = m
mc m (y1 , m , k) = 1 1
j
j=1
(65)
log N < I P, m 2 (M ) mc
(66)
mc If m = 1 then the decoder attempts to decode the received sequence, sets = m (y1 , k), and feeds back i a 1 to terminate transmission. Otherwise, the decoder feeds back a 0 and we return to step 2) to send chunk m + 1. Our code relies on the existence of a set of codewords {x(i, k)} which, when truncated to blocklength mc, form a good randomized code for a certain AVC. The key to our construction is that the condition checked by the decision function (66) is sufcient to guarantee that the decoding error will be small.
B. Codebook construction Our codebook will consist of codewords drawn uniformly from the set
c c c c T = (TP )n/c = TP TP TP . n/c times
(67)
That is, the codewords are formed by concatenating constant-composition chunks of length c. For a xed set of N messages to be transmitted, the minimum rate supportable by this codebook is = n1 log N , so an upper bound on M is M = n/c. The maximum rate is given by the randomized coding capacity with cost constraint 0 on the state. We will back away from this rate and dene a lower bound M on M by n = Cr (0) 2 (M ) , (68) M c We will let M = {M , M + 1, . . . , M }. To make our results easier to state, for n, M , c, , and 2 let us dene M to be the largest cost value such that n = I(P, M ) 2 (M ) . (69) Mc Lemma 3 (Derandomized rateless codebook): For any > 0, input type P P(X ) there is an n sufciently large and an (c, N, K) rateless code with rate and error at blocklength M c is given by
M = I(P, M ) 2 (M ) M >
(70)
,
where c1 log n 0 and
K (M + M c (Er (M + 2 (M ), P, M ) 2) 1) 2 (M ) = M 2 . M
log |S M c (M )|
(71)
(72)
17
C. Proof of Theorem 3 Proof: Fix > 0. Then Lemma 3 gives a codebook with the desired error as long as the M given by the stopping rule guarantees that the state sequence will have cost no more than M . If m satises (64) then we can see from the stopping rule that l sM c M cM . (73) 1 Therefore Lemma 3 gives the desired error bound at the decoding time M. We now address the efciency of our scheme and show that the loss in rate can be made arbitrarily small if the channel estimates are within 3 of the true channel cost. Corollary 1: Let m denote the true empirical cost in the m-th chunk:
1 c
mc
l(st ) = m ,
t=(m1)c+1
(74)
and let
M = 1 m . M m=1
M
(75)
If m m 3 for all m, then we have 1 log N I (P, M ) 1 + f (3 ) . Mc
(76)
Let V W(M ) and V W(M ) be the channels corresponding to Q and Q. Given the bound on dmax Q, Q , we can bound dmax V , V = max
x,y s
where f (3 ) = O(3 log 1 ). 3 Proof: Note that M M 3 as well. Let Q(s) Q(M ) be a distribution on S that minimizes I P, W(M ) . We will nd a distribution Q Q(M ) such that dmax Q, Q is small. Let s1 = argmin l(s) and s2 = argminl(s)=l(s1 ) l(s). Given a gap in cost 3 and a distribution Q, we can construct a distribution Q by setting Q(s1 ) = Q(s1 ) + and removing mass from elements s with higher cost than s1 . The largest that this can incur can be bounded: 3 dmax Q, Q = . (77) l(s2 ) l(s1 )
W (y|x, s)Q(s)
W (y|x, s)Q(s)
s
(78) (79) (80)
max
x,y
W (y|x, s)|Q(s) Q(s)| .
|S|dmax Q, Q I (P, V ) I P, V
Let g(3 ) = |S|3 /(l(s2 ) l(s1 )) Now, using a bound from [22], we can bound the mutual information gap:
= I (P, V ) I P, V
(81) (82)
2(|Y| 1)hb (g(3 )) + 2(|Y| 1) log(|Y 1))g(3 )
Let f (3 ) be this upper bound. Now, 1 log N I (P, M ) I (P, M ) I P, M + 1 Mc I (P, V ) I P, V + 1

1 + f (3 ) .
(83) (84) (85)
18
VI. R ATELESS
CODING WITH PARTIAL STATE INFORMATION
We now turn to a rateless coding construction for AVCs where the state sequence can depend on the transmitted codeword but we have partial side information about the empirical channel at the decoder. This construction combines the rateless code from the previous section with the list-decoding approach of Theorem 2. A. Coding strategy In order to make our decoder opportunistic, we explicitly utilize information about the output sequence y at the decoder. To wit, for a xed P , given the m-th chunk of channel outputs y(mc) and a side information set Vm ,
Vm (y(mc) , 1 ) = V Vm : dmax Ty(mc) ,
x
P (x)V (y|x)
< 1
(86)
We suppress the dependence of Vm 1 ) on P and will use the concatenated composition-P codebook of (67). Algorithm II : Rateless coding for nosy noise 1) Using common randomness, the encoder and decoder choose a key k [K] to use. The encoder chooses a message i [N ] to transmit and maps it into a codeword x(i, k) X n . 2) In channel uses (m 1)c + 1, (m 1)c + 2, . . . , mc, the encoder transmits xmc (m1)c+1 . (mc) and the channel state information set V and calculates the set 3) The decoder receives channel outputs y m of possible channels Vi (y(mc) , 1 ). If
log N 1 < mc m
m i=1
(y(mc) ,
I P, Vi (y(mc) , 1 ) mc2 .
(87)
then the decoder feeds back 1 and attempts to decode. Otherwise, it feeds back a 0 and we return to step 2) for chunk m + 1. For each chunk, the decoder looks over all channels in the side information set consistent with what it received, and takes the worst-case mutual information. The average of these worst-case mutual informations is our estimate of the empirical mutual information of the channel. B. Codebook construction The codebook we use is again sampled from T , but this time we use a two-step decoding process. Given a decoding time, the decoder list-decodes the received sequence using the partial side information. It then uses the key to hash the list using the same message authentication scheme as in Theorem 2. The following Lemma shows that a codebook with desirable properties exists. Lemma 4 (Concatenated codes with constant list size): For any 2 > 0, P P(X ), there is an n large enough, constant L, and a set of codewords {x(j) : j [N ]} with N = exp(n) such that given any CSI sequence (V1 , V2 , . . . , VM ) and channel output with decoding time M given by (87), the truncated codebook {xM c (j) : j 1 [N ]} is an (M c, N, L) list decodable code for constant list size L satisfying
L> 6 log |Y| , 2
(88)
and probability of decoding error

L (M ) M exp(cE(2 )) ,
(89)
19
C. Proof of Theorem 4 Proof: We will use the codebook from Lemma 4. Since the set of messages of xed size N , we use the construction of Lemma 2. This makes the code, when decoded at M c an (M c, exp(n)/ K(n), K(n)) randomized code with probability of error
2Ln . r (M, s) M exp(cE(2 )) + K log K
(90)
Then we can use choose L = 7(log |Y|)/2 to get

r (M, s) M exp(cE(2 )) +
14n log |Y| . 2 K log K
(91)
In the case where our channel estimates are accurate to within 3 , we can provide a bound on the total loss from the optimal performance knowing the true channel Vm for each chunk m. Corollary 2: Let Vm denote the true empirical channel in the m-th chunk from the left hand side of (28). If
Vm {V : dmax (Vm , V ) < 3 } ,
(92)
then we have
1 1 log N Mc M
M m=1
I (P, Vm )
1 log |Y| + 1 + f (3 ) . M
(93)
where f (3 ) = 2(|Y| 1)hb (3 ) + 2(|Y| 1) log(|Y 1))3 . Proof: We know that for all V VV that |V (y|x) Vm (y|x)| < 3 , so a simple bound [22] gives us |I (P, V ) I (P, Vm ) | f (3 ). Then
M M
log N c
I (P, Vm ) = c
m=1 m=1 M
I (P, Vm ) log N
M1
(94)
I (P, Vm ) + (M 1)c1
m=1
I (P, Vm ) c
M1
(95)
m=1
cI (P, VM ) + c
m=1
|I (P, Vm ) I (P, Vm )| + (M 1)c1
(96) (97)
c log |Y| + (M 1)c(1 + f (3 )) .
VII. D ISCUSSION In this paper we provided code constructions for arbitrarily varying channels in cases where the jammer has full or no knowlege of the transmitted codeword. For standard AVCs where the jammer has no knowledge of the codeword, we used a code construction suggested by previous authors [15], [26] that trades off the error probability with the key size. For AVCs where the jammer has knowledge of the transmitted codeword, we found the randomized coding capacity and provided a code construction based on list-decodable codes with a different error/key tradeoff. Although in some examples the capacity formulae are the same, including the binary modulo-additive channel, in others the capacity may be lower under the more pessimistic error model. One interesting model for these point-to-point channels that we did not address is the case where the jammer has noisy access to the transmitted codeword. This can happen, for example, when the jammer is eavesdropping on a wireless multihop channel. Our derandomization strategies are tailored to the extreme ends of our channel model, where the jammer has no knowledge or full knowledge. A unied coding scheme that achieves capacity for a range of assumptions on the jammers knowledge may help unify the two approaches. Our derandomization strategies extend to the study of rateless coding when the decoder can obtain an estimate of the empirical channel. Under the standard AVC model, we study the case where decoder obtains an estimate
20
of the average state cost over a chunk. We construct a coding strategy and bound its loss from the true empirical mutual information in terms of the channel estimation error. For channels with the more robust error model, we construct a list-code based strategy that is more opportunistic and list decodes on a chunk-by-chunk basis. This architecture may be interesting for more practical code constructions, given the recent attention to list decoding with soft information [28]. We note in passing that our list-coding strategy can be used with side information sets that enforce the standard AVC model by setting V = Q(), but that this code construction yields a less desirable error-key tradeoff. The channel model with nosy noise is related to the model studied by Agarwal, Sahai, and Mitter [1]. They x the input distribution P and a distortion function d(x, y) between the channel inputs and outputs and show that for a xed distortion level D, the maximum rate of reliable communication is the rate distortion function RP (D) for the source P . In some cases, their model and ours is identical. For example, for channels whose output is the sum of the state and input, a difference distortion measure can be generated from the state cost function. In our second example of mixed BSC and Z-channels, it is unclear how to apply the rate-distortion framework. In this case, the distortion perspective is not as natural as the AVC model. An additional application of the results and techniques of this paper is to the problem of communicating over channels with individual state sequences as studied in [22]. The common randomness required to use our code constructions can be generated from zero-rate noiseless feedback from the decoder to the encoder. In the scheme presented in [22], the partial channel state information is generated by training sequences in the forward link. Although we do not consider the provenance of our partial side information in this paper, feed-forward training or side-information channels are an important practical means for the decoder to obtain such measurements. Finally, although the results in this paper are for nite alphabets, extensions to continuous alphabets and the Gaussian AVC setting [17], [23], [24] should be possible using appropriate approximation techniques. An interesting rateless code using lattice constructions has been proposed by Erez et al. in [20], and it would be interesting to see if that approach can work for more robust channel models. A PPENDIX A. Codebook for standard AVCs Lemma 5 (Fully randomized rateless codebook): For any > 0 and input distribution P P(X ) there exists an blocklength n sufciently large and chunk size c(n) with c1 log n 0 such that the randomized codebook {X(i) : i [N ]} of size exp(n) uniformly distributed on T has the following property: or any M M, this codebook truncated to blocklength M c is a randomized codebook of rate n M = , (98) Mc and error
M exp (M c (Er (M + 2 (M ), P, M ) 2)) ,
(99)
for the AVC W with cost constraint M given by

M < I(P, M ) 2 (M ) . (100) Proof: We will prove that for each M M there exists a randomized codebook CM of blocklength M c with the specied error for the AVC with cost constraint M . This codebook will be equal in distribution to the codebook CM of blocklength M c truncated to blocklength c. Standard randomized codebook. Fix M and let A be a randomized codebook of A codewords drawn uniformly M from the constant composition set TP c . From [26, Theorem 1] we have the following error bound on message i with RM + = (M c)1 log A for an AVC with cost constraint M : M (A, i) exp (M c (Er (RM + , P, M ) )) ,
(101)
Let M denote this upper bound. A fortiori, we have the same bound on the average error
1 A
A i=1
M (A, i) M .
(102)
21 c Expurgation. Let B be a random variable formed by expurgating all codewords not in the set {TP }M . That is, we keep only those codewords which are piecewise constant composition with composition P . We write B = c A {TP }M . Note that a realization of B has a variable number of codewords. We declare an encoding error if the number of codewords in B is smaller than B for some number B . We use a combinatorial bound [22]: c |TP |M exp(M log(M c)(P )) , M |TP c |
(103)
where (P ) < is a positive constant. Let us denote this lower bound by M . M Since A is formed by iid draws from TP c , the codebook size |B| is the sum of A Bernoulli random variables with parameter greater than M . Hence we can use Sanovs theorem [12]:
P (|B| B) (A + 1)2 exp (AD (B/A M )) .
(104)
We can bound the exponent by using the inequality (1 a) log(1 a) 2a for small a and discarding the small positive term (1 B/A) log M :
AD (B/A M ) = B log B/A 1 B/A + A(1 B/A) log M 1 M B/A 2B B log M
(105) (106) (107)
If we let B/A = M M then

P (|B| B) exp (AM M (log M 2) + 2 log(A + 1)) .
(108)
Since A = O(exp(M c)) the probability of encoder error is much smaller than the decoding error bound M . The encoder using B now operates as follows : it draws a realization of a codebook and declares an error if the realization contains fewer than B codewords. Otherwise it transmits the i-th codeword in the codebook for message i [B]. Note that this construction uses only B codewords for each codebook. The average error on the fraction B/A = M M of preserved codewords can be at most A/B times the original average error:
1 B
B i=1
M (B, i) =
A M B
(109)
M . (110) M M Permutation. We now form our random codebook C by taking the codebook induced by encoder using B and concatenating a permutation of the message index. The encoder using C takes a message i, randomly chosen permutation on [B], and a codebook B from B and outputs the codeword i from B . The randomized codebook C is the image of this encoding. The maximal error for a message i in this codebook is given by 1 M (B, (i)) (111) M (C, i) = B! 1 = B
B
M (B, i)
i=1
(112) (113)
log(M c) (P ) c .
M M M
1 = M exp M c Er (RM + , P, M )
(114)
Let CM denote the codebook for M M. Nesting. Now consider the codebook CM and set the size of the codebook to exp(n). We can write the rate of our original codebook AM as 1 RM = log(M M ) . (115) M c
22
n 1 (116) log(M M ) . Mc Mc Therefore if we truncate CM to blocklength M c, the resulting randomized code is identically distributed to CM with M = M M /M . For n and c sufciently large RM = log n 1 (P ) log M < . c Mc
Truncating CM to blocklength M c for M M gives a code of rate
(117)
We can write the error as

M (CM , i) exp (M c (Er (M + 2 (M ), P, M ) 2)) ,
(118)
where
M 2 . M to make the exponent positive. We need 2 (M ) = M + 2 (M ) < I (P, M ) .
(119)
The last thing to do is choose M
(120)
With this lemma in hand, we can apply the Lemma 1 to derandomize the randomized code above. Proof: Let C be the codebook-valued random variable that is the randomized code from Lemma 5. For each M , let CM be the the codebook truncated to blocklength M c. We know that CM forms a good randomized codebook with error (99) for the AVC with cost constraint M in (100). Let us write for the upper bound in (99). By applying Lemma 1 to CM we see that the probability of K codebooks sampled uniformly from CM failing to have error bounded by M is upper bounded by M n exp KM log + Mc + log |S| . (121) e Mc Since the marginal distribution of sampling uniformly on A is uniform on AM , we can choose errors M satisfying Thus the error M for each M M must satisfy
KM (log M 1 + M c (Er (M + 2 (M ), P, M ) 2)) > + M c log |S| .
(122)
Therefore we may choose (M ) to satisfy

M > + M c log |S| . K (log M 1 + M c (Er (M + 2 (M ), P, M ) 2))
(123)
B. Codebook for AVCs with input-dependent state Over the next few lemmas, we will construct a list code that can exploit the partial side information on a chunkby-chunk basis. Our rst lemma extends Lemma 2 from [31] to the case where the decoder has extra information c about the average empirical channel. In the construction from [31], the codebook is the complete typical set TP and the decoder outputs the union over all shells from the output sequence. We can use the CSI to modify the decoder and thereby reduce the list size. Lemma 6 (Exponential list decoding with variable side information): For any 1 > 0, P P(X ), and V c V(c), there is a c sufciently large such that the set TM is an (c, N, L(V)) list code with
c N = |TM | exp (c(H(X) 1 ))
(124)
,
L(V) exp c
c V V(y1 ,1 )
max
H(X|Y ) + 1
(125)
23
and error
L exp(cE1 (1 )) . (126) (y) be the marginal distribution Proof: Fix 1 > 0. For an input distribution P (x) and channel V (y|x), let P on Y and V (x|y) be the channel such that P (x)V (y|x) = P (y)V (x|y). Our decoder will output the set
c L(y1 ) =
TV
V
c V(y1 ,1 )
(|X |+1)1
(y) .
(127)
The size of this set is, by a union bound, upper bounded by (125). The proof of Lemma 1 in [31] shows that the c probability that x L(y1 ) is not in this set or /
y /
TV1 (x) V
c V(y1 ,|X |1)
(128)
is upper bounded by
L (W) exp(cEL (W, 1 )) .
(129)
For c sufciently large, the size of this list can be bounded by (125), and the error probability is still bounded by
L (W) exp(cEL (W, 1 )) .
(130)
c Thus, with probability exponential in c, this set will contain the transmitted x TP . Taking a union bound over v possible values of the side information V shows that the |V(c)| = c c
L exp(cEL (W, 1 ) v log c) ,
(131)
which gives the exponent E1 (1 ). With the previous lemma as a basic building block, we can create nested list-decodable codes where c is chosen to to be large enough to satisfy the conditions of Lemma 6. The codebooks we will consider are
c c c c TM = (TP )M = TP TP TP . M times
(132)
We will x a number and a set of N = exp(n) codewords for our rateless code construction. Let M = {M , M + 1, . . . , M }, where M = n/c and M = (n)/(c min{|X |, |Y|}). The set M is the set of possible decoding times for our code. M Lemma 7 (Concatenated exponential list codes): For any 1 > 0, P P(X ), and V1 = (V1 , V2 , . . . , VM ) M V(c)M , there is a c sufciently large such that the set TM is an (M c, NM , L(V1 )) list code with
NM exp (M c(H(X) 1 ))
M M L(V1 ) exp c m=1 V Vm (y(mc) ,1 )
(133)
H(Xm |Ym ) + M 1 ,
max
(134)
and error
L M exp(cE2 (1 )) . (135) Proof: Choose c large enough to satisfy the conditions of Lemma 6. Our decoder will operate by list decoding each chunk separately. Let Lm be the list size guaranteed by Lemma 6 for the m-th chunk. Then the corresponding M upper bound in M Lm is the desired the upper bound on L(V1 ). The probability of the list in each chunk not m=1 containing the corresponding transmitted chunk can be upper bounded: L M exp(cE1 (1 )) .
(136)
As long as c grows faster than log M the decoding error will still decay exponentially.
24
The next lemma says that we can subsample the list code TM to get a set of codewords that is a (M c, N, L) list decodable code for constant L, N , and all state information sequences. In subsampling, we can dene for each M truncation M , output sequence y1 c , and side information sequence (V1 , . . . , VM ) a decoding bin
M M B(M, y1 c , V1 ) X M c , M M M |B(M, y1 c , V1 )|
(137)
which is the list given by the code in Lemma 7. The size of each bin is
exp c max H(Xm |Ym ) + M 1 .
m=1
V Vm (y(mc) ,1 )
(138)
We will draw codewords uniformly from TM and show that at most L codewords will fall into each of the bins M B(M, yM c , V1 ). Proof: (Proof of Lemma 4) Fix 2 > 0 and let 1 be chosen later. We begin with the codebook TM . Note that the truncation of this codebook to blocklength M c for M M is the codebook in Lemma 7. Let {Zj : j [N ]} M M be N = exp(n) codewords uniformly from TM . Consider a xed M , y1 c , and V1 V(c)M that satisfy the conditions of the decoding rule in (87). Then
M M P(Zj B(M, y1 c , V1 )) M M |B(M, y1 c , V1 )| exp (M c (H(X) 1 )) M
(139)
max H(Xm |Ym )Ty(mc) + 2M c1
exp c exp c
m=1 M m=1 M
H(X)
V Vm (y(mc) ,1 )
(140)
I P, Vm (y(mc) , 1 ) + 2M c1
(141)
Let
G = exp c I P, Vm (y(mc) , 1 ) + 2M c1 .
(142)
m=1
M M From our stopping rule, we know that (M, y1 c , V1 ) satises: M
n < c
m=1
I P, Vm (y(mc) , 1 ) M c2 .
(143)
M M The random variable 1(Zi B(M, y1 c , V1 )) is Bernoulli with parameter smaller than G, so we can bound the M M probability that L of the N codewords land in B(M, y1 c , V1 ) using Sanovs theorem:
1 N
i=1
M M 1(Zi B(M, y1 c , V1 )) > L/N
(N + 1)2 exp (N D (L/N 1 L/N 1G
G)) .
(144)
The exponent can be written as

L log L/N G + N (1 L/N ) log .
(145)
To deal with the (1 L/N ) log((1 L/N )/(1 G)) term we use the inequality (1 a) log(1 a) 2a (for small a) on the term (1 L/N ) log(1 L/N ) and discard the small positive term (1 L/N ) log(1 G):
N D (L/N G) L log = L log L/N G L/N G N 2(L/N ) 2L
M
(146) (147)
+ L log L 2L
= L n + c
m=1
I P, Vm (y(mc) , 1 ) 2M c1
(148) (149)
> L (M c2 2M c1 ) + L log L 2L .
25
For large enough n we can upper bound (N + 1)2 2n + L. For large enough L, L log L > 3L, so we can ignore those terms as well. This gives the bound
P 1 N
N i=1 M M 1(Zi B(M, y1 c , V1 )) > L/N
exp (LM c (2 21 ) + 2n L log L + 3L) exp (LM c (2 21 ) + 2n) .
(150) (151)
Thus we can choose 2 > 21 and

L> 2 n , M c (2 21 )
(152)
which guarantees that the subsampling will yield a good list-decodable code. Since this subsampling must be good for all M , we must choose the largest lower bound on L, which is for M = M . This gives
L> 2 log |Y| . (2 21 )
(153)
Choosing 1 = 2 /3 and E(2 ) = E1 (2 /3) yields the result.
R EFERENCES
[1] M. Agarwal, A. Sahai, and S. Mitter, Coding into a source: a direct inverse rate-distortion theorem, in 45th Annual Allerton Conference on Communication, Control and Computation, 2006. [2] R. Ahlswede, A note on the existence of the weak capacity for channels with arbitrarily varying channel probability functions and its relation to Shannons zero-error capacity, Annals of Mathematical Statisticss, vol. 41, pp. 10271033, 1970. [3] , Channel capacities for list codes, Journal of Applied Probability, vol. 10, no. 4, pp. 824836, 1973. [4] , Channels with arbitrarily varying channel probability functions in the presence of noiseless feedback, Zeitschrift f r u Wahrscheinlichkeit und verwandte Gebiete, vol. 25, pp. 239252, 1973. [5] , Elimination of correlation in random codes for arbitrarily varying channels, Zeitschrift f r Wahrscheinlichkeitstheorie und u verwandte Gebiete, vol. 44, no. 2, pp. 159175, 1978. [6] , A method of coding and an application to arbitrarily varying channels, Journal of Combinatorics, Information and System Sciences, vol. 5, pp. 1035, 1980. [7] , The maximal error capacity of arbitrarily varying channels for constant list sizes, IEEE Transactions on Information Theory, vol. 39, no. 4, pp. 14161417, 1993. [8] R. Ahlswede and J. Wolfowitz, The capacity of a channel with arbitrarily varying channel probability functions and binary output alphabet, Zeitschrift f r Wahrscheinlichkeit und verwandte Gebiete, vol. 15, no. 3, pp. 186194, 1970. u [9] D. Blackwell, L. Breiman, and A. Thomasian, The capacities of certain channel classes under random coding, Annals of Mathematical Statistics, vol. 31, pp. 558567, 1960. [10] V. Blinovsky, P. Narayan, and M. Pinsker, Capacity of the arbitrarily varying channel under list decoding, Problems of Information Transmission, vol. 31, no. 2, pp. 99113, 1995. [11] V. Blinovsky and M. Pinsker, Estimation of the size of the list when decoding over an arbitrarily varying channel, in Proceedings of 1st French-Israeli Workshop on Algebraic Coding, ser. Lecture Notes in Computer Science, G. Cohen, S. Litsyn, A. Lobstein, and G. Z mor, Eds., no. 781. Berlin: Springer-Verlag, July 1993. e [12] T. Cover and J. Thomas, Elements of Information Theory. New York: Wiley, 1991. [13] I. Csisz r and J. K rner, On the capacity of the arbitrarily varying channel for maximum probability of error, Zeitschrift f r a o u Wahrscheinlichkeitstheorie und verwandte Gebiete, vol. 57, pp. 87101, 1981. [14] , Information Theory: Coding Theorems for Discrete Memoryless Systems. Budapest: Akad mi Kiad , 1982. e o [15] I. Csisz r and P. Narayan, Arbitrarily varying channels with constrained inputs and states, IEEE Transactions on Information Theory, a vol. 34, no. 1, pp. 2734, 1988. [16] , The capacity of the arbitrarily varying channel revisited : Positivity, constraints, IEEE Transactions on Information Theory, vol. 34, no. 2, pp. 181193, 1988. [17] , Capacity of the Gaussian arbitrarily varying channel, IEEE Transactions on Information Theory, vol. 37, no. 1, pp. 1826, 1991. [18] S. Draper, B. Frey, and F. Kschischang, Rateless coding for non-ergodic channels with decoder channel state information, submitted to IEEE Transactions of Information Theory. [19] , Efcient variable length channel coding for unknown DMCs, in Proceedings of the 2004 International Symposium on Information Theory, Chicago, USA, 2004. [20] U. Erez, M. Trott, and G. Wornell, Rateless coding for gaussian channels, August 2007, arXiv:0708.2575v1 [cs.IT]. [21] T. Ericson, Exponential error bounds for random codes on the arbitrarily varying channel, IEEE Transactions on Information Theory, vol. 31, no. 1, pp. 4248, 1985. [22] K. Eswaran, A. Sarwate, A. Sahai, and M. Gastpar, Limited feedback achieves the empirical capacity, November 2007, submitted to IEEE Transactions of Information Theory. [Online]. Available: arXiv:0711.0237v1 [cs.IT]
26
[23] B. Hughes and P. Narayan, Gaussian Arbitrarily Varying Channels, IEEE Transactions on Information Theory, vol. 33, no. 2, pp. 267284, 1987. [24] , The Capacity of a Vector Gaussian Arbitrarily Varying Channel, IEEE Transactions on Information Theory, vol. 34, no. 5, pp. 9951003, 1988. [25] B. Hughes, The smallest list for the arbitrarily varying channel, IEEE Transactions on Information Theory, vol. 43, no. 3, pp. 803815, 1997. [26] B. Hughes and T. Thomas, On error exponents for arbitrarily varying channels, IEEE Transactions on Information Theory, vol. 42, no. 1, pp. 8798, 1996. [27] J. Kiefer and J. Wolfowitz, Channels with arbitrarily varying channel probability functions, Information and Control, vol. 5, no. 1, pp. 4454, 1962. [28] R. Koetter and A. Vardy, Algebraic soft-decision decoding of Reed-Solomon codes, IEEE Transactions on Information Theory, vol. 49, no. 11, pp. 28092825, 2003. [29] M. Langberg, Private codes or succinct random codes that are (almost) perfect, in Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2004), Rome, Italy, 2004. [30] M. Luby, LT codes, in Proceedings of the 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. [31] A. Sarwate and M. Gastpar, Deterministic list codes for state-constrained arbitrarily varying channels, EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2007-5, January 8 2007. [Online]. Available: http://sipc.eecs.berkeley.edu/ [32] O. Shayevitz and M. Feder, Achieving the empirical capacity using feedback part I: Memoryless additive models, submitted to IEEE Transactions on Information Theory. [Online]. Available: http://www.eng.tau.ac.il/ofersha/empirical capacity part1.pdf [33] A. Shokrollahi, Fountain codes, in Proceedings of the 41st Allerton Conference on Communication, Control, and Computing, October 2003, pp. 12901297. [34] N. Shulman, Communication over an unknown channel via common broadcasting, Ph.D. dissertation, Tel Aviv University, 2003. [35] A. Smith, Scrambling adversarial errors using few random bits, optimal information reconciliation, and better private codes, in Proceedings of the 2007 ACM-SIAM Symposium on Discrete Algorithms (SODA 2007), 2007. [36] A. Tchamkerten and I. E. Telatar, Variable length coding over an unknown channel, IEEE Transactions on Information Theory, vol. 52, no. 5, pp. 21262145, May 2006.

Rateless Coding With Partial State Information at The Decoder

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Rateless Coding With Partial State Information at The Decoder

Uploaded by

Copyright:

Available Formats

1

Rateless coding with partial state information at the decoder

(1 W n (Di,k |k (i), s)) .

(1 W n (Di,k |k (i), J(i, k (i)))) .

W (y|x)P (x) log

W (y|x)P (x) . P (x) x W (y|x )P (x )

We will also dene the following sets:

W (y|x, s)Q(s), Q(s) Q() W (y|x, s)U (s|x), U (s|x) U(P, ) .

(19) (20) (21)

V(c) [K] {0, 1}

M Y1M c : M = M, M (Y1M c , V1 , k) = i M Y1M c : M = M, M (Y1M c , V1 , k) = i

M (i, k), sM c 1 1 M (i, k), JM (i, M (i, k)) 1 1 .

(29) (30) (31)

1 log N (n) n 1 log K(n) 0 . n

RESULTS , TECHNIQUES , AND EXAMPLES

Cr () + log |S| 1 log (n) + e(E (R, P , ) (n r r

Then we can choose r (n) = O(exp( n)), where < .

Cr () + log |S| . (n1 log r (n) + e(Er (R, P , ) ))

8nCdep () log |Y| . n/2 log n

8nCdep () log |Y| . n exp(n /2)

the decoding time M is given by

the decoding time M is given by

Try saying that 20 times quickly!

x3 , A11 x81 , A21 x4 , A31

exp(Kr)E [exp(r(Ck , s))]K = exp(Kr)E 1 +

exp (K(r (n)er )) .

= exp (K(r log(1 + (n)er )))

Taking a union bound over all s and N = exp(nR) we obtain:

exp (K(r (n)er ) + n(log |S| + R)) .

and for S = 2 it is an S-channel with crossover c:

I(P,W) versus input distribution P (bits)

Input distribution p = P(X = 1)

) versus input distribution P (bits)

Input distribution p = P(X = 1)

I(P,Wdep) versus P for BSC/Z example (bits)

Input distribution p = P(X = 1)

and probability of error

WITH COST INFORMATION

where c1 log n 0 and

If m m 3 for all m, then we have 1 log N I (P, M ) 1 + f (3 ) . Mc

(78) (79) (80)

W (y|x, s)|Q(s) Q(s)| .

2(|Y| 1)hb (g(3 )) + 2(|Y| 1) log(|Y 1))g(3 )

Let f (3 ) be this upper bound. Now, 1 log N I (P, M ) I (P, M ) I P, M + 1 Mc I (P, V ) I P, V + 1

(83) (84) (85)

CODING WITH PARTIAL STATE INFORMATION

and probability of decoding error

Then we can use choose L = 7(log |Y|)/2 to get

14n log |Y| . 2 K log K

|I (P, Vm ) I (P, Vm )| + (M 1)c1

c log |Y| + (M 1)c(1 + f (3 )) .

for the AVC W with cost constraint M given by

(105) (106) (107)

If we let B/A = M M then

Truncating CM to blocklength M c for M M gives a code of rate

We can write the error as

The last thing to do is choose M

Therefore we may choose (M ) to satisfy

L exp(cEL (W, 1 ) v log c) ,

M M From our stopping rule, we know that (M, y1 c , V1 ) satises: M

M M 1(Zi B(M, y1 c , V1 )) > L/N

(N + 1)2 exp (N D (L/N 1 L/N 1G