Shannon Theory On General Probabilistic Theory

Shannon Theory on General Probabilistic Theory
Keiji Matsumoto1 and Gen Kimura

1:National Institute of Informatics, Chiyoda-ku, Tokyo, Japan
2: Shibaura Institute of Technology
July 30, 2017
Abstract
This paper is an effort to extend Shannon theory to general probabilistic theory.
1 General Probabilistic Theory

In general proabilistic theory (GPT), a system acompanies the set M of all
the measurements, the set S of all the states, and the function of ω ∈ S,
M ∈ M, and the measurement result x ∈ X to the probability PωM (x) ∈ [0, 1]
of measureing x by the mesurement M applied to ω. In this paper, X is an
arbitrary finite set, for mathematical simplicity.
Given these, we can construct a representation of S and M on an ordered
real linear space B with the unit u,
∀m ∈ B ∃c ∈ R m ≤ cu.
The composition of the representation is as follows: let L∞ be a space of all
the bounded functions over S, endorsed with the sup-norm kf k := supω∈S f (ω)
and the order ”≤” of point-wise comparison:
f1 ≥ f2 ⇔ f1 (ω) ≥ f2 (ω), ∀ω ∈ S.
This order ≤ is proper in the sense that the positive cone has non-empty interior.
Also, u is an element of L∞ with u(ω) = 1, ∀ω ∈ S.
Then mx : x → PωM (x) is an element of L∞ . We call mx an effect, and the
measurement M is represented by (mx )x∈X . We define B as the norm closure
of the linear span of all the effects, where X moves over all the finite sets. Each
measurement M = (mx )x∈X should satisfy
X
mx ≥ 0, mx = u, (1)
x∈X
A state ω is represented by a positive element τ (ω) of the dual B ∗ of B with

hτ (ω), ui = 1.
1
A state ω and a mesurement M defines the probablity dsitribution
PωM (x) = hτ (ω), mx i ,
whose physical interpretation is the probability of obtaining the measurement

outcome x.
The representations (mx )x∈X and τ may not be bijective, but has one-to
-one correspondence with the equivalence classes in the sense that
τ (ω) = τ (ω 0 ) ⇔ ∀M ∈ M ∀x ∈ X PωM (x) = PωM0 (x) ,

0
(mx )x∈X = (m0x )x∈X ⇔ ∀ω ∈ S ∀x ∈ X PωM (x) = PωM (x) .
Therefore, abusing the notation, we identify S with τ (S) ⊂ B ∗ , and write ω

instead of τ (ω).
In the paper, to avoid mathematical difficulties, we suppose dim B < ∞.
Observe that a convex mixture of elements of S (being subset of B ∗ , a convex
mixuture is well-defined) corresponds to physical probabilistic mixture of states.
Therefore, in this paper we assume
cl conv S = S.
It turns out that

S = {ω; ω ≥ 0, hω, ui = 1} ⊂ B ∗ .
Similarly, we suppose M is the set of all M = (mx )x∈X satisfiying (1) for simplic-
ity, though we realize putting additional restrictions on feasible measurements
is often measningful.
2 Classicl Message Sending on GPT

The theme we discuss in this paper is classical message sendng on GPT: the
sender encode a message x, chosen out of the set X , and encode it into the
state ω̃(x), which ends up as ω(x) ∈ S at the ouput port after passing through
certain physical processes. The reciever applies measurement M to recover the
message.
In the above explanation, the channel is apparently difficult to be put into
GPT theory, since GPT lacks a theory of physical process, at least the one
comparable to its classical or quantum mechanical counterparts. This problem,
however, is bypassed easily using the following formalism often used in quantum
information theory. Observe the probability of correct decoding is descided
solely by the output state ω(x) ∈ S corresponding to the message x, and that
the physical process mapping ω̃(x) to ω(x) is not relevant. Thus, we represent a
channel and an endocer by closed convex subset F of the set of S of the output
system, and by a map ω(·) of X into F , respectively.
A decoder is a measurement M̃ = (mx )x∈X ∪{0} a measurement over the
set X ∪ {0}, where 0 represents ”failure of decode”. Two figures of merit is
2
our concern: the size |X | of the set X of messages and the average success
probability Psucc of decode:
1 X
Psucc = hω(x), mx i .
|X |
x∈X
Note m0 does not appear none of figures of merits. Therefore, M = (mx )x∈X ,
which is subject to the restriction
X
mx ≥ 0, mx ≤ u, (2)
x∈X
instead of (1), is called a decoder.

Since exact optimization of these figures of merit is involved in general, in
quantum and classical information theory, we often concentrate on asymptotic
behaviours, where the times n of use of channel is very large. Suppose the set
of states on the output port of the channel is a subset of S. Then, the the
set of states of the output port of n-times use of channel should be a subset of
S n , where S n is n-times composition of S. However, the trouble here is there
is no widely agreed manner of composing single systems in GPT. In quantum
mechanics, if S is the set of all the density operator s over H, then S n is the set
of all the density operator s over H⊗n . In GPT, any convex set sandwitched by
minimal tensor and maximal tensor is possible, and there is no guiding principle
to single out S n .
Therefore, we view B n , S n , F n , Mn etc., as a simply indexed ordered vector
space, the set of states, the channel, the set of measurement, etc., and do not
assume any structual relations between the those with different value of indeces.
In classical and quantum information theory, frameworks such as information
spectrum and smooth Renyi entropy approach are in the same sprit: since they
do not assume any properties on sequences of quantum states or probability
distributions, the parameter n can be viewed as mere index numbering the
sysytems. In this paper, we heavily rely on the frameworks and tools used in
these theories.
3 Average Message Size

3.1 definition and rewriting by duality
In classical message sending, the succcess probability Psucc of decoding and the
size |X | of the set X of the messages is our concern. Thus, we define average
message size, which is the maximization of these two:
N (F ) := sup |X | × Psucc
X ,ω(·),M
X
= sup hω(x), mx i. (3)
X ,ω(·),M x∈X
3
This also can be written as
X
N (F ) = sup hω, mω i,
M
ω∈ F
where the supremum taken over all M ’s such that mω 6= 0 at most finite subset
of F . (Then the sum in the RHS is well-defined.)
Then X
N (F ) ≤ sup hω, mω i + hη, G(M )i
M
ω∈ F
holds for any η ≥ 0, where we had defined

X
G(M ) : = u − mω .
ω∈ F
Therefore,
N (F ) ≤ inf ϕ(η), (4)
η≥0
where
X
ϕ(η) := sup hω, mω i + hη, G(M )i
M
ω∈ F
X
= hη, ui + sup hω − η, mω i
M
ω∈ F

hη, ui , if η ≥ ω, ∀ω ∈ F,
= .
∞ otherwise
Below we show the inequality (4) saturates and inf in the RHS can be re-
placed by min using Lemma 15. To this end, we check the hypothesis of the
lemma : (i) existence of a decoder M1 with G(M1 ) > 0 and (ii) N (F ) is finite.
First, consider M1 such that

u − m1 , ω = ω0 ,
mω =
0, otherwise.
Recall ”≥” is proper in the sense that there is an interior point m01 of the positive
cone. Also, there is a > 0 with au ≥ m01 . Thus, if m1 = a1 m01 , M1 satisfies (2)
and G(M1 ) = m1 > 0. Thus (i) is verified.
Second, N (F ), being positive, is bounded from below. To show N (F ) is
bounded from above, we use (4) and the assumption that B ∗ is finite dimen-
sional: let (ξi )li=1 be the maximal set of linearly independent vectors inf F .
Then expand ω ∈ F is written as
l
X l
X
ω= ai ξi ≤ max |ai | ξi .
i
i=1 i=1
4
Since the map ω → maxi |ai | is continuous and F is comapct, this can be
bounded by amax > 0 from above. Therefofe,
l
X
ω ≤ amax ξi ,
i=1
and
l
X l
X
N (F ) ≤ ϕ(amax ξi ) = amax hξi , ui = amax l < ∞,
i=1 i=1
verifying (ii).
Thus, N (F ) is rewritten as follows:
N (F ) = min{hη, ui ; c ≥ 0, ∀ω ∈ F ω ≤ η, η ≥ 0}
= min{c; c ≥ 0, ∀ω ∈ F ω ≤ cω0 , ω0 ∈ S}.
3.2 max-relative entropy and its pertrubed version

In this subsection we introduce
Dmax (ωkω0 ) := min{λ; ω ≤ 2λ ω0 , λ ≥ 0},

ε
Dmax (ωkω0 ) := min Dmax (ω 0 kω0 ) ,
ω 0 ∈Bε1 (ω)
where Bε1 (ω) is the closed ε−ball centered at ω with respect to the norm k·k1 .
The former and the latter is a GPT version of and the Renner’s max-relative
entropy and the ε-smoothed max relative entropy, respectively [2][3].
Proposition 1 If ω = pω1 + (1 − p)ω2 and ω0 = pω0,1 + (1 − p)ω0,2 (p ∈ [0, 1]),
2Dmax (ωkω0 ) ≤ p2Dmax (ω1 kω0 ) + (1 − p)2Dmax (ω2 kω0 ) , (5)

Dmax (ωkω0 ) ≤ pDmax (ωkω0,1 ) + (1 − p)Dmax (ωkω0,2 ), (6)
and
ε ε ε
2Dmax (ωkω0 ) ≤ p2Dmax (ω1 kω0 ) + (1 − p)2Dmax (ω2 kω0 ) , (7)
ε ε ε
Dmax (ωkω0 ) ≤ pDmax (ωkω0,1 ) + (1 − p)Dmax (ωkω0,2 ). (8)
0
Proof. It suffices to show the last two inequality, since Dmax = Dmax .
0 1
To show (7), To show this, observe, for any ωi ∈ Bε (ωi ) (i = 1, 2),
ω 0 := pω10 + (1 − p)ω20
= p{(ω10 − ω1 ) + (ω1 − ω)}
+ (1 − p) {(ω20 − ω2 ) + (ω2 − ω)} + ω
= p (ω10 − ω1 ) + (1 − p) (ω20 − ω2 ) + ω
5
is an element of pBε1 (0) + (1 − p)Bε1 (0) + ω = Bε1 (ω). If ωi0 ≤ 2λi ω0 (i = 1, 2) in
addition, then
ω 0 ≤ p2λ1 + (1 − p)2λ2 ω0 .

Thus we have the assertion.

To show (8), suppose ωi00 ≤ 2λi ω0,i , ωi00 ∈ Bε1 (ω) (i = 1, 2), and let
λ := − log2 (p2−λ1 + (1 − p)2−λ2 ),
so that
p2λ−λ1 + (1 − p)2λ−λ2 = 1.
Then
ω 00 := p2λ−λ1 ω100 + (1 − p)2λ−λ2 ω200

≤ p2λ−λ1 2λ1 ω0,1 + (1 − p)2λ−λ2 2λ2 ω0,2

= 2λ ω0 .
Also, ω 00 ∈ Bε1 (ω) due to convexity of Bε1 (ω). Thus, to each λi > Dmax (ωi00 kω0,1 ),
there is δ > 0 such that
ε
Dmax (ωkω0 ) ≤ Dmax (ω 00 kω0 )
≤λ
00 00
= − log2 (p2−Dmax (ω1 kω0,1 ) + (1 − p)2−Dmax (ω2 kω0,2 ) ) + δ
00 00
≤ −p log2 2−Dmax (ω1 kω0,1 ) − (1 − p) log2 2−Dmax (ω2 kω0,1 ) + δ
= pDmax (ω100 kω0,1 ) + (1 − p)Dmax (ω200 kω0,2 ) + δ
ε ε
≤ pDmax (ωkω0,1 ) + (1 − p)Dmax (ωkω0,2 ) + δ
and that δ & 0 as maxi=1,2 {λi − Dmax (ωi00 kω0,1 )} & 0. Thus, taking this limit,
(8) is proved.
log2 N ε (F ) = inf max Dmax

ε
(ωkω0 )
ω0 ∈S ω∈F
= inf sup E P Dmax

ε
(ωkω0 )
ω0 ∈S P
= sup inf E P Dmax

ε
(ωkω0 )
P ω0 ∈S
3.3 representation by max-relative entropy
N (F ) = min{c; c ≥ 0, ∀ω ∈ F ω ≤ cω0 , ω0 ∈ S}
= inf max 2Dmax (ωkω0 ) ,
ω0 ∈S ω∈F
Define also
log2 N ε (F ) := inf max Dmax
ε
(ωkω0 ), (9)
ω0 ∈S ω∈F
6
4 An upper bound to capacity
Motivated by the consideration of the asymptotic efficiency of message sending
by the (poosiblly) noisy channels, let us consider a sequence of systems Sn
(n = 1, 2, · · · ) with the underlying ordered vector space B n with the unit element
un , the set Mn of all the measurements, and the set S n of all states. Let X n
be the set of messages to be send.
Then a channel is represented by a convex subset F n of S n , which phys-
ically corresponds to the set of states of output system of the (parallel use
of) channel(s). Let X n be the set of messages to be send, ω n (·) be a map
x ∈ X n → ω n (x) ∈ F n representing an encoder, and M n be a tuple of effects
M n = {mnx }x∈X n with
X
mnx ≤ un ,
x∈X n
representing the decoder measurement. Given a sequence {F n }∞ n=1 of channels,

we optimize the sequece {X n , ω n (·), M n }∞ n=1 .
Define
1 X n
C({F n }∞
n=1 ) : = sup {R ; lim hω (x), mnx i > 0, |X n | = b2nR c},
{X n ,ω n (·),M n }∞ n→∞ |X n |
n=1 x∈X n
a n ∞ 1 X n
C ({F }n=1 ) : = sup { R; lim n|
hω (x), mnx i ≥ a, |X n | = b2nR c}.
n n
{X ,ω (·),M }n=1n ∞ n→∞ |X n x∈X
Remark 2 An alternative definition of C is:

1 1 X n
C({F n }∞
n=1 ) : = sup { lim log2 |X n | ; lim hω (x), mnx i > 0},
{X n ,ω n (·),M n }∞ n→∞ n n→∞ |X n |
n=1 n
x∈X
0
These two definitions are equivalent indeed. Let C denote the RHS of the above
alternative definition. Since this alternative definition put no restriction on
0
|X n |, C ≥ C.
On the other hand, for any ε > 0, there is a sequence {X1n , ω1n (·), M1n }∞
n=1
and nε such that
1 0
∀n ≥ nε , log2 |X1n | ≥ C − ε.
n
0 0
If |X1n | > b2n(C −ε) c, we only have to reduce the seze of |X1n | to b2n(C −ε) c:
Align elements of X1n so that hω1n (x), mn1,x i is decreasing, let X n be the set of
0
−ε)
the first b2n(C c- th elements of X1n , and let ω n (x) = ω1n (x) and mnx = mn1,x .
0
−ε)
Then |X n | = b2n(C c and the average success probability does not decrease:
1 X n 1 X n
n
hω (x), mnx i ≥ n| hω1 (x), mn1,x i.
|X | n
|X 1 n
x∈X x∈X
C a also admits similar alternative definition, which is equivalent to the one

above.
7
Obviously,
lim C a = sup C a
a↓0 a>0
1 X n
= sup sup{R ; lim hω (x), mnx i ≥ a, |X n | = b2nR c}
{X n ,ω n (·),M n }∞ a>0 n→∞ |X n |
n=1 n x∈X
= C.
Lemma 3 If Z ⊂ R × A, where A is a non-empty set,
inf inf{c; (c, a) ∈ Z} = inf{c; ∃a ∈ A, (c, a) ∈ Z}, (10)

a∈A
sup inf{c; (c, a) ∈ Z} = inf{c; ∀a ∈ A, (c, a) ∈ Z}. (11)
a∈A
Proof. (10) holds since
inf inf{c; (c, a) ∈ Z} = inf{c; (c, a) ∈ Z}

a∈A
and (c, a) ∈ Z is equivalent to ∃a ∈ A, (c, a) ∈ Z.

Almost parallely, we have
sup sup{c; (c, a) ∈ Z c } = sup{c; ∃a ∈ A, (c, a) ∈ Z c }.

a∈A
Rewriting its LHS and RHS by
sup{c; (c, a) ∈ Z c } = inf{c; (c, a) ∈ Z}
and
sup{c; ∃a ∈ A, (c, a) ∈ Z c } = inf{c; ∀a ∈ A, (c, a) ∈ Z}
respectively, we have (11).
Lemma 4 For any measurement M = {mx }x∈X and ω(x) ∈ F , A

X
hω(x), mx i ≤ N ε (F ) + ε|X |. (12)
x∈X
Proof. By (10) and (11),

0
N ε (F ) = inf sup inf 2Dmax (ω kω0 )
ω0 ∈S ω∈F ω 0 ∈Bε1 (ω)
= inf sup inf inf {c; ω 0 ≤ cω0 }

ω0 ∈S ω∈F ω 0 ∈Bε1 (ω) c
= inf sup inf {c; ∃ω 0 ∈ Bε1 (ω), ω 0 ≤ cω0 }

ω0 ∈S ω∈F c
= inf inf {c ; ∀ω ∈ F, ∃ω 0 ∈ Bε1 (ω), ω 0 ≤ cω0 },

ω0 ∈S c≥1
8
where we had used the above lemma. For any ω 0 (·) with ω 0 (x) ∈ Bε1 (ω(x)), and
cω0 with ω 0 (x) ≤ cω0 ,
X X
hω(x), mx i ≤ hω 0 (x), mx i + ε|X |
x∈X x∈X
X
≤c hω0 , mx i + ε|X |
x∈X
= c + ε|X |.
Thus, taking the infimum about c and ω0 , we have the asserted inequality.
Lemma 5
( )
a 1 X
n
C ≥ lim sup log2 hω (x), mnx i n
− a|X | , (13)
n→∞ n X n ,ω n (·),M n
x∈X n
and
( )
a 1 X
n
C ≤ lim sup log2 hω (x), mnx i n
− (a − δ) |X | , ∀ δ ∈ (0, a),
n→∞ n X n ,ω n (·),M n
x∈X n
(14)
where |X n | = 2nR .
Proof. To show (13), observe
1 X n
Ca ≥ sup { R; ∀n, hω (x), mnx i ≥ a}.
{X n ,ω n (·),M n }∞ |X n | n
n=1 x∈X
(The RHS is well-defined, since the set { R; ∀n, |X1n | x∈X n hω n (x), mnx i ≥ a}
P
is not empty: for any channel, 0 (which means |X n | = 1 for all n) is an element
of this set.) Rearrange the terms of the inequality
1 X n
hω (x), mnx i ≤ 1 + a
|X n | n x∈X
to obtain X
|X n | ≥ hω n (x), mnx i − a|X n |.
x∈X n
Therefore,
1 X n
sup {|X n |; hω (x), mnx i ≥ a}
X n ,ω n (·),M n |X n | n
x∈X
1 X
≥ sup {|X n |; n hω n (x), mnx i ≥ a}
X n ,ω n (·),M n |X | n
x∈X
( )
X X
n n n n n n
≥ sup hω (x), mx i − a|X | ; hω (x), mx i − a|X | ≥ 0
X n ω n (·),M n x∈X n x∈X n
X
n
= sup { hω (x), mnx i − a|X | }. n
X n ω n (·),M n x∈X n
9

By |X n | = 2nR taking limn→∞ of both ends, we have the asserted inequality.
To show (14), consider {X n ω n (·), M n } such that limn→∞ |X1n | x∈X n hω n (x), mnx i ≥
P
a, and let {nk } be an increasing sequence of natural numbers wihch eventually

satsifies
1 X δ
hω nk (x), mnx k i ≥ a −
|X nk | nk
2
x∈X
or equivalently,
X δ nk
hω nk (x), mnx k i − ( a − δ) |X nk | ≥ |X |.
2
x∈X nk
Rearranging terms and taking the limit,

( )
1 X
nk nk nk
R ≤ lim log2 hω (x), mx i − ( a − δ) |X |
k→∞ nk
x∈X nk
( )
1 X
n n n
≤ lim log2 hω (x), mx i − ( a − δ) |X | .
n→∞ n
n x∈X
Theorem 6
1 1
C a ≤ lim lim log2 N ε (F n ) = lim lim inf ε
sup Dmax (ω n kω0n ),
n
ε↓0 n→∞ ε↓0 n→∞ n ω0n ∈S n ω n ∈F n
1 1
C ≤ lim lim log2 N ε (F n ) = lim lim inf ε
sup Dmax (ω n kω0n ), (15)
ε↓0 n→∞ n ε↓0 n→∞ n ω0n ∈S n ω n ∈F n
Proof. By (12),
( )
X
a−δ n n
log2 N (F ) ≥ sup log2 hω (x), mnx i n
− (a − δ) |X | ,
X n ,ω n (·),M n x∈X n
where |X n | = b2nR c. By (14),

1
C a ≤ lim log2 N a−δ (F n ) , ∀δ ∈ (0, a) .
n→∞ n
Since N ε (F n ) is decreasing in ε, we have the first inequality. The second one
is obtained by lima↓0 C a = C.
4.1 Noiseless channel of symmetric GPT

Suppose that S n is symmetric, or equivalently, that to each pair of extreme
points ω and ω 0 there is an affine bijection g with g(ω) = ω 0 . The set of all
affine bijections is denoted by Gn . Gn can be thought of a closed subgroup of an
orthogonal group. Therefore, Gn admits a Haar measure µn with µn (Gn ) = 1.
Suppose the channel F n = S n .
10
Lemma 7 If g ∈ Gn and η ∈ (B n )∗ ,
kg(η)k1 = kηk1 .
Proof. Define g ∗ by duality
hg(η), g ∗ (m)i = hη, mi.
Recall ω ∈ S n is equivalent to g(ω) ∈ S n . Thus,
kg ∗ (m)k = sup hω, g ∗ (m)i = sup hg(ω), m)i = sup hg(ω), m)i = kmk .
ω∈S n ω∈S n g(ω)∈S n

Similarly, (g −1 )∗ (m) = (g ∗ )−1 (m) = kmk. Thus, kmk ≤ 1 is equivalent to
kg ∗ (m)k ≤ 1. Therefore,
kg(η)k1 = sup hg(η), mi = sup hη, g ∗ (m)i = sup hη, g ∗ (m)i = kηk1 .
m:kmk≤1 m:kmk≤1 m:kg ∗ (m)k≤1
g(ω0 )dµn (g), where ω0 is an arbtrary element of

R
Lemma 8 Let ωm := g∈Gn
S n . Then
∀ω ∈ S n , ω ≤ N (S n ) ωm .
Proof. This we had shown in a previous manuscript.
Lemma 9 With notations of the previous lemma,
∀ω ∈ S n ∃ξ ∈ Bε (ω), ξ ≤ N ε (S n )ωm .
Proof. Let ω be an arbitrary element of S n . Below we show the existence of

ξ ∈ Bε (ω) with ξ ≤ N ε (S n )ωm . Let ω 0 be a state with
ε
sup Dmin (ωkω 0 ) ≤ N ε (S n ) + δ.
ω∈F
Then to each g ∈ Gn , there is ξg ∈ Bε (g(ω)) with
(N ε (S n ) + δ)ω 0 − ξg ≥ 0.
Then g −1 (ξg ) ∈ Bε (ω), and N ε (S n )ωm

0
− g −1 (ξg ) ≥ 0, or equivalently,
0
N ε (S n )g(ωm ) − ξg ≥ 0
Therefore,
Z Z
0
0 ≤ N ε (S n ) g(ωm )dµn (g) − ξg dµn (g)
g∈Gn g∈Gn
Z
= N ε (S n )ωm − ξg dµn (g).
g∈Gn
11
Also,
Z Z
n n

ω − ξg dµ (g) =
(ω − ξg )dµ (g)

g∈Gn 1 g∈Gn 1
Z
n
≤ kω − ξg k1 dµ (g) ≤ ε.
g∈Gn
ξg dµn (g) is an element of Bε (ω) with ξ ≤ N ε (S n )ωm .

R
Thus, ξ := g∈Gn
Lemma 10 Suppose hη, ui = 0. Then there is η+ ≥ 0 such that η ≤ η+ and
kηk1 = 2hη+ , ui.
Proof. if m0 = 2m − u, 0 ≤ m ≤ u is equivalent to −u ≤ m0 ≤ u and
hη, m0 i = hη, 2m − ui = 2hη, mi.
Thus
kηk1 := sup hη, m0 i = 2 sup hη, mi.
m0 :−u≤m0 ≤u m:0≤m≤u
Also by strong duality of convex optimization,
sup hη, mi = min sup {hη, mi − hη+ , m − ui}

m:0≤m≤u η+ ≥0 m≥0

= min sup hη − η+ , mi + hη+ , ui
η+ ≥0 m≥0
= min{hη+ , ui; η+ − η ≥ 0, η+ ≥ 0}.
Thus the minimizor of the last end satisfies the requirement.
Theorem 11 Under the hypothesis of the previous lemmas,

1 1
lim log2 N ε (S n ) = lim log2 N (S n ).
n→∞ n n→∞ n
Proof. Since ”≤” is trivial by definition, we prove ”≥”. By the previus lemma,
there is η+ ≥ 0 with
η ≤ η+
and kη+ k1 = 2 kηk1 = hη+ , ui. Combined with
η+ ≤ kη+ k1 N (S n )ωm ,
we have
η ≤ 2 kηk1 N (S n )ωm .
Also, by Lemma 9, there is ξ ∈ (B n )∗ with kξ − ωk1 ≤ ε satisfies
ξ ≤ N ε (S n )ωm .
12
If η := ω − ξ,
ω = ξ + η ≤ (N ε (S n ) + 2 kηk1 N (S n )) ωm
≤ (N ε (S n ) + 2εN (S n )) ωm .
Therefore, by definitin of N (S n ),
N (S n ) ≤ N ε (S n ) + 2εN (S n ).
Sorting out terms,

1 1 1
log2 N ε (S n ) ≥ log2 N (S n ) + log2 (1 − 2ε).
n n n
Taking the limit n → ∞ of both ends, we hva the assertion.
5 An upperbound to maximized mutual infor-

mation
Suppose B n equals n-th tensor product B ⊗n . Choose S n so that dim S n =
dim B n − 1 = (dim B)n − 1. Then so long as each effect of each measurement
is living in the dual cone of S n ,
Note
N ε (F n ) ≤ N (F n ) ≤ dim S n + 1 = (dim B)n ,

thus
C ≤ dim B = dim S + 1.
Note also the optimized mutual information supM I M on a single system
equals the capacity whence the measurement is restricted to the composition
of identical measurements on each systems followed by classical information
processings. Thus,
sup I M ≤ C ≤ dim B = dim S + 1,

M
confirming the known upper bound to supM I M .
6 Quantum channels
Let the system Sn is the quantum system with underlying Hilbert space H⊗n ,
and the measurement set Mn be the set of all the positive operator valued
measures over finite sets. In this case, the bound (15) is tight. as is shown
below. In particular, if channel F n is the convex closure of F ⊗n ( F ⊂ S),
or equivalentlly, the repeted use of memoryless channel with separable signal
states, (15) equals celebrated Holevo’s bound.
13
By (15),
1 ε
C ≤ lim inf sup lim D (ω n kω0n )
ε↓0 {ω0n ∈S n }∞ ∞
n=1 {ω n ∈F n }n=1
n→∞ n max
1 ε
= sup inf sup lim Dmax (ω n kω0n )
ε:ε>0 {ω0n ∈S n }∞
n=1 {ω n ∈F n }∞ n→∞ n
n=1
1 ε
≤ inf sup sup lim D (ω n kω0n )
{ω0n ∈S n }∞ ∞
n=1 {ω n ∈F n }n=1 ε:ε>0
n→∞ n max
= inf sup D(({ω n } k {ω0n }) , (16)
{ω0n ∈S n }∞ ∞
n=1 {ω n ∈F n }n=1
where we had defined
D ({ω n } k {ω0n }) := inf{λ; lim tr [ω n − 2nλ ω0n ]+ = 0},

n→∞
and (16) is due to Theorem 2 of [2].

On the other hand, it is known that
( )
X
n n
n nλ n

C = sup n infn ∞ inf λ; lim p (ω )tr ω − 2 ω0 + = 0 , (17)
{pn } {ω0 ∈S }n=1 n→∞
ω n ∈F n
where pn is a probability distribution over F n with finite support [4]. As is

shown below, in fact, the RHS of (16) equals the RHS of (17), indicating the
bound (15) is tight:
C = inf sup D(({ω n } k {ω0n }) . (18)

ω0 ∈S {ω n ∈F n }∞
n=1
Remark 12 The above definition of D is by [2], but it is equivalent to the

definition given by Nagaoka, as is proved in Proposition 1 of [2]. The RHS
of (17) is in fact written in the spirit of [2], thus is slightly different from [4].
These two are also equivalent, since the RHS of (17) can be written as D of
block diagonal density operators.
A technical lemma is in order.
Lemma 13 Suppose fn (λ, a) and gn (λ, a) is a functon of [0, ∞) × An into

[0, ∞). Suppose also minan ∈An fn (λ, an ) exists for all λ ≥ 0 and n ∈ N. Then
inf inf {λ; lim fn (λ, an ) = 0} = inf {λ; lim min fn (λ, an ) = 0} (19)
{an }∞
n=1
λ n→∞ λ n→∞ an ∈An
and
sup inf {λ; lim gn (λ, an ) = 0} = inf {λ; lim sup gn (λ, an ) = 0}. (20)
{an }∞ λ n→∞ λ n→∞ an ∈An
n=1
14
Proof. By (10),
inf inf {λ; lim fn (λ, an ) = 0} = inf {λ; ∃{an }∞

n=1 , lim fn (λ, an ) = 0}.
{an }∞
n=1
λ n→∞ λ n→∞
Thus, to show (19), it suffices to show that
∃{an }∞
n=1 , lim fn (λ, an ) = 0 ⇔ lim min fn (λ, an ) = 0.
n→∞ n→∞ an ∈An
To show ⇐ , we only have to take the minimizor of minan ∈An fn (λ, an ). To

show opposite implication, suppose the LHS is ture, and limn→∞ fn (λ, a∗n ) = 0.
Then
lim min fn (λ, an ) ≤ lim fn (λ, a∗n ) = 0,
n→∞ an ∈An n→∞
proving ⇒.
Next, by (11),
sup inf {λ; lim gn (λ, an ) = 0} = inf {λ; ∀{an }∞

n=1 , lim gn (λ, an ) = 0}.
{an }∞ λ n→∞ λ n→∞
n=1
Thus, to show (11), it suffices to show
∀{an }∞
n=1 , lim gn (λ, an ) = 0 ⇔ lim sup gn (λ, an ) = 0.
⇐ is shown by observing
lim gn (λ, an ) ≤ lim sup gn (λ, an )

holds for any {an }∞

n=1 .
To show the opposite implication, suppose the RHS is untrue,
lim sup gn (λ, an ) > 0.

n→∞ an ∈An
Then choosing a∗n with
gn (λ, a∗n ) ≥ sup gn (λ, an ) − ε,

an ∈An
1
where ε := 2 limn→∞ supan ∈An gn (λ, an ) > 0, we have
lim gn (λ, a∗n ) ≥ lim sup gn (λ, an ) − ε

1
= lim sup gn (λ, an ) > 0.
2 n→∞ an ∈An
Therefore, the LHS is also untrue, showing ⇒.
15
By (19) and (20), (17) is rewritten to
( )
X
n n
n nλ n

C = inf λ; lim sup min n n
p (ω )tr ω − 2 ω0 + = 0
n→∞ pn ω0 ∈S
ω n ∈F n
( )
X
pn (ω n )tr ω n − 2nλ ω0n + = 0

= inf λ; lim min
n n
sup
n→∞ ω0 ∈S pn
ω n ∈F n

n nλ n

= inf λ; lim min
n n
sup tr ω − 2 ω0 + = 0 ,
n→∞ ω0 ∈S ω n ∈F n
where to derive the second identity, we had used Fan’s minimax theorem:
Lemma 14 (Fan’s minimax theorem, [1] ) Suppose that X be a compact convex

subset of vector space, and Y be a convex subset of a vector space. Assume
that f : X × Y → R satisfies following conditions: (1) x → f (x, y) is lower
semicontinuous and convex on X for every y ∈ Y: (2) y → f (x, y) is concave
on Y for every x ∈ X . Then
min sup f (x, y) = sup min f (x, y) .

x∈X y∈Y y∈Y x∈X
To this end, using (19) and (20) again,

n o
sup inf λ; lim tr ω n − 2nλ ω0n + = 0

C = inf
{ω0n ∈S n } {ωn ∈F n } n→∞
= inf sup D({ω n }∞ n ∞

n=1 k{ω0 }n=1 ),
{ω0n ∈S n } {ωn ∈F n }
which is (18).
If F n represents repeated use of the memoryless channel represented by F ,
or equivalently F n is the convex closure of F ⊗n , the RHS of (18) equals so-called
Holevo bound of the quantum-classical channel,
inf sup D(ωkω0 ) = C({F n }∞

n=1 )
ω0 ∈S ω∈F
1 X n
: = sup{R ; lim hω (x), mnx i = 1, |X n | = b2nR c}.
n→∞ |X n |
n x∈X
or the optimal asymptotic rate of inofmration transmission when asymptotic

error rate is zero. Since in the definition of C({F n }∞
n=1 ), asymptotic error rate
strictly less than 1 is tolerated,
C({F n }∞ n ∞
n=1 ) ≥ C({F }n=1 ).
But in the case of quantum memoryless channel, identity holds [5][6]:
C({F n }∞ n ∞
n=1 ) = C({F }n=1 ) = inf sup D(ωkω0 ).
ω0 ∈S ω∈F
16
References
[1] J. M. Borwein and D. Zhuang, On Fan’s minimax theorem, Math. Program-
ming, Vol. 34, 232–234 (1986).
[2] Datta, Nilanjana. ”Min-and max-relative entropies and a new entanglement
monotone.” IEEE Transactions on Information Theory 55.6 (2009): 2816-
2826.
[3] Nilanjana Datta, Renato Renner: Smooth Renyi Entropies and the Quantum
Information Spectrum, IEEE Transactions on Information Theory, vol. 55,
pp. 2807-2815, 2009 (Doi: 10.1109/TIT.2009.2018340)
[4] M. Hayashi and H. Nagaoka, General formulas for capacity of classical-
quantum channels. IEEE Trans. Info. Theory, 49:1753, 2003.
[5] Ogawa, Tomohiro, and Hiroshi Nagaoka. ”Strong converse to the quantum
channel coding theorem.” IEEE Transactions on Information Theory 45.7
(1999): 2486-2489.
[6] Andreas Winter, Coding Theorems of Quantum Information Theory, Ph.D.
dissertation, Uni Bielefeld, 1999. (https://arxiv.org/abs/quant-ph/9907077)
Appendix A: langrange duality of convex pro-

gram
This section reviews useful known facts about convex optimization problems
with constraint. Let f be a real-valued functional on a subset Ω of a vector
space D, and let G be a mapping of D into a normed and ordered space B.
Consider the following maximization problem with convex constraint :
sup f (M ).
M ∈Ω,G(M )≥0
Then for any positive element η of B ∗ ,
sup f (M ) ≤ sup f (M ) + hη, G (M )i .

M ∈Ω,G(M )≥0 M ∈Ω,G(M )≥0
Therefore,
sup f (M ) ≤ inf ϕ(η), (21)
M ∈Ω,G(M )≥0 η≥0
where
ϕ (η) := sup f (M ) + hη, G (M )i .
M ∈Ω,G(M )≥0
The inequality (21) is called weak duality. Remark that neither f , G, nor Ω need
to be convex for the weak duality to hold. However, for the inequality (21) to
17
saturate, these conditions are necessary. The following lemma gives conditions
for strong duality, or the saturation of the inequality.
Lemma 15 (Theorem 1, Section 8.6 of [?] ) Suppose that f , G and Ω are con-
vex. Suppose also there exists an M1 such that G (M1 ) > 0 and sup {f (M ) ; G (M ) ≥ 0, M ∈ Ω}
is finite. Then
sup f (M ) = min ϕ (η) ,
M ∈Ω,G(M )≥0 η≥0
and the maximum on the right is achieved by some η ≥ 0.

If the supremum on the left is achived by some M0 ∈ Ω, then
hη0 , G (M0 )i = 0
and M0 maximizes f (M ) + hM, G (M )i, M ∈ Ω.
18

Shannon Theory On General Probabilistic Theory

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Shannon Theory On General Probabilistic Theory

Uploaded by

Copyright:

Available Formats

Shannon Theory on General Probabilistic Theory

Keiji Matsumoto1 and Gen Kimura

1 General Probabilistic Theory

A state ω is represented by a positive element τ (ω) of the dual B ∗ of B with

PωM (x) = hτ (ω), mx i ,

whose physical interpretation is the probability of obtaining the measurement

τ (ω) = τ (ω 0 ) ⇔ ∀M ∈ M ∀x ∈ X PωM (x) = PωM0 (x) ,

Therefore, abusing the notation, we identify S with τ (S) ⊂ B ∗ , and write ω

It turns out that

2 Classicl Message Sending on GPT

instead of (1), is called a decoder.

3 Average Message Size

holds for any η ≥ 0, where we had defined

3.2 max-relative entropy and its pertrubed version

Dmax (ωkω0 ) := min{λ; ω ≤ 2λ ω0 , λ ≥ 0},

Proposition 1 If ω = pω1 + (1 − p)ω2 and ω0 = pω0,1 + (1 − p)ω0,2 (p ∈ [0, 1]),

2Dmax (ωkω0 ) ≤ p2Dmax (ω1 kω0 ) + (1 − p)2Dmax (ω2 kω0 ) , (5)

Thus we have the assertion.

λ := − log2 (p2−λ1 + (1 − p)2−λ2 ),

ω 00 := p2λ−λ1 ω100 + (1 − p)2λ−λ2 ω200

≤ p2λ−λ1 2λ1 ω0,1 + (1 − p)2λ−λ2 2λ2 ω0,2

log2 N ε (F ) = inf max Dmax

= inf sup E P Dmax

= sup inf E P Dmax

3.3 representation by max-relative entropy

representing the decoder measurement. Given a sequence {F n }∞ n=1 of channels,

Remark 2 An alternative definition of C is:

C a also admits similar alternative definition, which is equivalent to the one

Lemma 3 If Z ⊂ R × A, where A is a non-empty set,

inf inf{c; (c, a) ∈ Z} = inf{c; ∃a ∈ A, (c, a) ∈ Z}, (10)

Proof. (10) holds since

inf inf{c; (c, a) ∈ Z} = inf{c; (c, a) ∈ Z}

and (c, a) ∈ Z is equivalent to ∃a ∈ A, (c, a) ∈ Z.

sup sup{c; (c, a) ∈ Z c } = sup{c; ∃a ∈ A, (c, a) ∈ Z c }.

Rewriting its LHS and RHS by

sup{c; (c, a) ∈ Z c } = inf{c; (c, a) ∈ Z}

Lemma 4 For any measurement M = {mx }x∈X and ω(x) ∈ F , A

Proof. By (10) and (11),

= inf sup inf inf {c; ω 0 ≤ cω0 }

= inf sup inf {c; ∃ω 0 ∈ Bε1 (ω), ω 0 ≤ cω0 }

= inf inf {c ; ∀ω ∈ F, ∃ω 0 ∈ Bε1 (ω), ω 0 ≤ cω0 },

a, and let {nk } be an increasing sequence of natural numbers wihch eventually

Rearranging terms and taking the limit,

where |X n | = b2nR c. By (14),

4.1 Noiseless channel of symmetric GPT

Proof. Define g ∗ by duality

hg(η), g ∗ (m)i = hη, mi.

Recall ω ∈ S n is equivalent to g(ω) ∈ S n . Thus,

g(ω0 )dµn (g), where ω0 is an arbtrary element of

Proof. This we had shown in a previous manuscript.

Lemma 9 With notations of the previous lemma,

Proof. Let ω be an arbitrary element of S n . Below we show the existence of

Then to each g ∈ Gn , there is ξg ∈ Bε (g(ω)) with

Then g −1 (ξg ) ∈ Bε (ω), and N ε (S n )ωm

ξg dµn (g) is an element of Bε (ω) with ξ ≤ N ε (S n )ωm .

Lemma 10 Suppose hη, ui = 0. Then there is η+ ≥ 0 such that η ≤ η+ and

kηk1 = 2hη+ , ui.

Proof. if m0 = 2m − u, 0 ≤ m ≤ u is equivalent to −u ≤ m0 ≤ u and

hη, m0 i = hη, 2m − ui = 2hη, mi.

Also by strong duality of convex optimization,

sup hη, mi = min sup {hη, mi − hη+ , m − ui}

= min{hη+ , ui; η+ − η ≥ 0, η+ ≥ 0}.