Classical capacity explained

In quantum information theory, the classical capacity of a quantum channel is the maximum rate at which classical data can be sent over it error-free in the limit of many uses of the channel. Holevo, Schumacher, and Westmoreland proved the following least upper bound on the classical capacity of any quantum channel

l{N}

\chi(l{N})=

max
	\rho^XA

I(X;B)_l{N(\rho)}

where

\rho^XA

is a classical-quantum state of the following form:

\rho^XA=\sum_xp_X(x)\vertx\rangle\langlex\vert^X ⊗

	A
\rho
	x

p_X(x)

is a probability distribution, and each

	A
\rho
	x

is a density operator that can be input to the channel

l{N}

Achievability using sequential decoding

We briefly review the HSW coding theorem (thestatement of the achievability of the Holevo information rate

I(X;B)

forcommunicating classical data over a quantum channel). We first review theminimal amount of quantum mechanics needed for the theorem. We then coverquantum typicality, and finally we prove the theorem using a recent sequentialdecoding technique.

Review of quantum mechanics

In order to prove the HSW coding theorem, we really just need a few basicthings from quantum mechanics. First, a quantum state is a unit trace,positive operator known as a density operator. Usually, we denote itby

\rho

\sigma

\omega

, etc. The simplest model for a quantum channelis known as a classical-quantum channel:

x\mapsto\rho_x.

The meaning of the above notation is that inputting the classical letter

at the transmitting end leads to a quantum state

\rho_x

at the receivingend. It is the task of the receiver to perform a measurement to determine theinput of the sender. If it is true that the states

\rho_x

are perfectlydistinguishable from one another (i.e., if they have orthogonal supports suchthat

Tr\left\{\rho_x

\rho
	x^\prime

\right\}=0

for

x ≠ x^\prime

), then the channel is a noiseless channel. We are interested in situationsfor which this is not the case. If it is true that the states

\rho_x

allcommute with one another, then this is effectively identical to the situationfor a classical channel, so we are also not interested in these situations.So, the situation in which we are interested is that in which the states

\rho_x

have overlapping support and are non-commutative.

The most general way to describe a quantum measurement is with a positive operator-valued measure (POVM). We usually denote the elements of a POVM as

\left\{Λ_m\right\}_m

. These operators should satisfypositivity and completeness in order to form a valid POVM:

Λ_m\geq0 \forallm

\sum_mΛ_m=I.

The probabilistic interpretation of quantum mechanics states that if someonemeasures a quantum state

\rho

using a measurement device corresponding tothe POVM

\left\{Λ_m\right\}

, then the probability

p\left(m\right)

for obtaining outcome

is equal to

p\left(m\right)=Tr\left\{Λ_m\rho\right\},

and the post-measurement state is

	\prime
\rho		=
	m

	1
	p\left(m\right)

\sqrt{Λ_m

}\rho\sqrt,if the person measuring obtains outcome

. These rules are sufficient for usto consider classical communication schemes over cq channels.

Quantum typicality

The reader can find a good review of this topic in the article about the typical subspace.

Gentle operator lemma

The following lemma is important for our proofs. Itdemonstrates that a measurement that succeeds with high probability on averagedoes not disturb the state too much on average:

Lemma: [Winter] Given anensemble

\left\{p_X\left(x\right),\rho_x\right\}

with expecteddensity operator

\rho\equiv\sum_xp_X\left(x\right)\rho_x

, supposethat an operator

such that

I\geqΛ\geq0

succeeds with highprobability on the state

\rho

Tr\left\{Λ\rho\right\}\geq1-\epsilon.

Then the subnormalized state

\sqrt{Λ}\rho_x\sqrt{Λ}

is closein expected trace distance to the original state

\rho_x

E_X\left\{\left\Vert\sqrt{Λ}\rho_X\sqrt{Λ} -\rho_X\right\Vert₁\right\}\leq2\sqrt{\epsilon}.

(Note that

\left\VertA\right\Vert₁

is the nuclear norm of the operator

so that

\left\VertA\right\Vert₁\equiv

\left\{\sqrt{A^\daggerA}\right\}

The following inequality is useful for us as well. It holds for any operators

\rho

\sigma

such that

0\leq\rho,\sigma,Λ\leqI

The quantum information-theoretic interpretation of the above inequality isthat the probability of obtaining outcome

from a quantum measurementacting on the state

\rho

is upper bounded by the probability of obtainingoutcome

on the state

\sigma

summed with the distinguishability ofthe two states

\rho

and

\sigma

Non-commutative union bound

Lemma: [Sen's bound] The following boundholds for a subnormalized state

\sigma

such that

0\leq\sigma

and

Tr\left\{\sigma\right\}\leq1

with

\Pi₁

, ...,

\Pi_N

beingprojectors:

Tr\left\{\sigma\right\}-Tr\left\{\Pi_N … \Pi ₁ \sigma \Pi₁ … \Pi_N\right\}

	N
\leq2\sqrt{\sum
	i=1

Tr\left\{\left(I-\Pi_i\right)\sigma\right\}},

We can think of Sen's bound as a "non-commutative unionbound" because it is analogous to the following union boundfrom probability theory:

\Pr\left\{\left(A₁\cap … \capA_N\right)^c\right\} =\Pr\left\{

	c
A
	1

\cup … \cup

	c
A
	N

\right\}

	N
\leq\sum
	i=1

\Pr\left\{

	c
A
	i

\right\},

where

A₁,\ldots,A_N

are events. The analogous bound for projectorlogic would be

Tr\left\{\left(I-\Pi₁ … \Pi_N … \Pi₁\right) \rho\right\}

	N
\leq\sum
	i=1

Tr\left\{\left(I-\Pi_i\right) \rho\right\},

if we think of

\Pi₁ … \Pi_N

as a projector onto the intersection ofsubspaces. Though, the above bound only holds if the projectors

\Pi₁

,...,

\Pi_N

are commuting (choosing

\Pi₁=\left\vert+\right\rangle \left\langle+\right\vert

\Pi₂=\left\vert0\right\rangle\left\langle 0\right\vert

, and

\rho=\left\vert0\right\rangle\left\langle0\right\vert

gives a counterexample). If the projectors are non-commuting, then Sen'sbound is the next best thing and suffices for our purposes here.

HSW theorem with the non-commutative union bound

We now prove the HSW theorem with Sen's non-commutative union bound. Wedivide up the proof into a few parts: codebook generation, POVM construction,and error analysis.

Codebook Generation. We first describe how Alice and Bob agree on arandom choice of code. They have the channel

x → \rho_x

and adistribution

p_X\left(x\right)

. They choose

classical sequences

xⁿ

according to the IID\ distribution

p
	Xⁿ

\left(xⁿ\right)

.After selecting them, they label them with indices as

\left\{xⁿ\left(m\right)\right\}_m\in\left[

. This leads to the followingquantum codewords:

\rho
	xⁿ\left(m\right)

=\rho
	x₁\left(m\right)

⊗ … ⊗ \rho
	x_n\left(m\right)

The quantum codebook is then

\left\{

\rho
	xⁿ\left(m\right)

\right\}

. The average state of the codebook is then

where

\rho=\sum_xp_X\left(x\right)\rho_x

POVM Construction . Sens' bound from the above lemma suggests a method for Bob to decode a state that Alice transmits. Bob shouldfirst ask "Is the received state in the average typicalsubspace?" He can do this operationally by performing atypical subspace measurement corresponding to

\left\{\Pi_\rho,\deltaⁿ

	n
,I-\Pi
	\rho,\delta

\right\}

. Next, he asks in sequential order,"Is the received codeword in the

m^th

conditionally typical subspace?" This is in some senseequivalent to the question, "Is the received codeword the

m^th

transmitted codeword?" He can ask thesequestions operationally by performing the measurements corresponding to theconditionally typical projectors

\left\{

\Pi

\rho		,\delta
	xⁿ\left(m\right)

,I-\Pi

\rho		,\delta
	xⁿ\left(m\right)

\right\}

Why should this sequential decoding scheme work well? The reason is that thetransmitted codeword lies in the typical subspace on average:

E
	Xⁿ

\left\{Tr\left\{\Pi_\rho,\delta

\rho
	Xⁿ

\right\}\right\}=Tr\left\{\Pi_\rho,\delta

E
	Xⁿ

\left\{

\rho
	Xⁿ

\right\}\right\}

=Tr\left\{\Pi_\rho,\delta \rho^⊗\right\}

\geq1-\epsilon,

where the inequality follows from (\ref). Also, theprojectors

\Pi

\rho		,\delta
	xⁿ\left(m\right)

are "good detectors" for the states

\rho
	xⁿ\left(m\right)

(on average) because the following condition holds from conditional quantumtypicality:

E
	Xⁿ

\left\{Tr\left\{

\Pi

\rho		,\delta
	Xⁿ

\rho
	Xⁿ

\right\}\right\}\geq1-\epsilon.

Error Analysis. The probability of detecting the

m^th

codeword correctly under our sequential decoding scheme is equal to

Tr\left\{

\Pi

\rho		,\delta
	Xⁿ\left(m\right)

\hat{\Pi}

\rho		,\delta
	Xⁿ\left(m-1\right)

… \hat{\Pi}

\rho		,\delta
	Xⁿ\left(1\right)

	n
\Pi
	\rho,\delta

\rho
	xⁿ\left(m\right)

	n
\Pi
	\rho,\delta

\hat{\Pi}

\rho		,\delta
	Xⁿ\left(1\right)

… \hat{\Pi}

\rho		,\delta
	Xⁿ\left(m-1\right)

\Pi

\rho		,\delta
	Xⁿ\left(m\right)

\right\},

where we make the abbreviation

\hat{\Pi}\equivI-\Pi

. (Observe that weproject into the average typical subspace just once.) Thus, the probability ofan incorrect detection for the

m^th

codeword is given by

1-Tr\left\{

\Pi

\rho		,\delta
	Xⁿ\left(m\right)

\hat{\Pi}

\rho		,\delta
	Xⁿ\left(m-1\right)

… \hat{\Pi}

\rho		,\delta
	Xⁿ\left(1\right)

	n
\Pi
	\rho,\delta

\rho
	xⁿ\left(m\right)

	n
\Pi
	\rho,\delta

\hat{\Pi}

\rho		,\delta
	Xⁿ\left(1\right)

… \hat{\Pi}

\rho		,\delta
	Xⁿ\left(m-1\right)

\Pi

\rho		,\delta
	Xⁿ\left(m\right)

\right\},

and the average error probability of this scheme is equal to

1-	1
	M

\sum_mTr\left\{

\Pi

\rho		,\delta
	Xⁿ\left(m\right)

\hat{\Pi}

\rho		,\delta
	Xⁿ\left(m-1\right)

… \hat{\Pi }

\rho		,\delta
	Xⁿ\left(1\right)

	n
\Pi
	\rho,\delta

\rho
	xⁿ\left(m\right)

	n
\Pi
	\rho,\delta

\hat{\Pi}

\rho		,\delta
	Xⁿ\left(1\right)

… \hat{\Pi}

\rho		,\delta
	Xⁿ\left(m-1\right)

\Pi

\rho		,\delta
	Xⁿ\left(m\right)

\right\}.

Instead of analyzing the average error probability, we analyze the expectationof the average error probability, where the expectation is with respect to therandom choice of code:

Our first step is to apply Sen's bound to the above quantity. But before doingso, we should rewrite the above expression just slightly, by observing that

=E
	Xⁿ

\left\{

	1
	M

\sum_m

Tr\left\{ \rho
	Xⁿ\left(m\right)

\right\}\right\}

=E
	Xⁿ

\left\{

	1
	M

\sum_mTr\left\{

	n
\Pi
	\rho,\delta

\rho
	Xⁿ\left(m\right)

\right\}

	n
+Tr\left\{ \hat{\Pi}
	\rho,\delta

\rho
	Xⁿ\left(m\right)

\right\}\right\}

=E
	Xⁿ

\left\{

	1
	M

\sum_mTr\left\{

	n
\Pi
	\rho,\delta

\rho
	Xⁿ\left(m\right)

	n
\Pi
	\rho,\delta

\right\} \right\}+

	1
	M

\sum_mTr\left\{\hat{\Pi}_\rho,\deltaⁿ

E
	Xⁿ

\left\{

\rho
	Xⁿ\left(m\right)

\right\} \right\}

=E
	Xⁿ

\left\{

	1
	M

\sum_mTr\left\{

	n
\Pi
	\rho,\delta

\rho
	Xⁿ\left(m\right)

	n
\Pi
	\rho,\delta

\right\} \right\}+Tr\left\{

	n
\hat{\Pi}
	\rho,\delta

\rho^{⊗
n}\right\}

\leqE
	Xⁿ

\left\{

	1
	M

\sum_m

	n
Tr\left\{ \Pi
	\rho,\delta

\rho
	Xⁿ\left(m\right)

\Pi_\rho,\deltaⁿ\right\}\right\}+\epsilon

Substituting into (and forgetting about the small

\epsilon

term for now) gives an upper bound of

E
	Xⁿ

\left\{

	1
	M

\sum_mTr\left\{

	n
\Pi
	\rho,\delta

\rho
	Xⁿ\left(m\right)

	n
\Pi
	\rho,\delta

\right\} \right\}

-E
	Xⁿ

\left\{

	1
	M

\sum_mTr\left\{

\Pi

\rho		,\delta
	Xⁿ\left(m\right)

\hat{\Pi}

\rho		,\delta
	Xⁿ\left(m-1\right)

… \hat{\Pi}

\rho		,\delta
	Xⁿ\left(1\right)

	n
\Pi
	\rho,\delta

\rho
	Xⁿ\left(m\right)

	n
\Pi
	\rho,\delta

\hat{\Pi}

\rho		,\delta
	Xⁿ\left(1\right)

… \hat{\Pi}

\rho		,\delta
	Xⁿ\left(m-1\right)

\Pi

\rho		,\delta
	Xⁿ\left(m\right)

\right\}\right\}.

We then apply Sen's bound to this expression with

	n
\sigma=\Pi
	\rho,\delta

\rho
	Xⁿ\left(m\right)

	n
\Pi
	\rho,\delta

and the sequentialprojectors as

\Pi

\rho		,\delta
	Xⁿ\left(m\right)

\hat{\Pi}

\rho		,\delta
	Xⁿ\left(m-1\right)

, ...,

\hat{\Pi}

\rho		,\delta
	Xⁿ\left(1\right)

. This gives the upper bound

E
	Xⁿ

\left\{

	1
	M

\sum_m2\left[Tr\left\{ \left(

I-\Pi

\rho		,\delta
	Xⁿ\left(m\right)

\right)

	n
\Pi
	\rho ,\delta

\rho
	Xⁿ\left(m\right)

	n
\Pi
	\rho,\delta

	m-1
\right\} +\sum
	i=1

Tr\left\{

\Pi

\rho		,\delta
	Xⁿ\left(i\right)

	n
\Pi
	\rho,\delta

\rho
	Xⁿ\left(m\right)

\Pi_\rho,\deltaⁿ\right\}\right]^1/2\right\}.

Due to concavity of the square root, we can bound this expression from aboveby

2\left[

E
	Xⁿ

\left\{

	1
	M

\sum_mTr\left\{ \left(

I-\Pi

\rho		,\delta
	Xⁿ\left(m\right)

\right)

	n
\Pi
	\rho ,\delta

\rho
	Xⁿ\left(m\right)

	n
\Pi
	\rho,\delta

	m-1
\right\} +\sum
	i=1

Tr\left\{

\Pi

\rho		,\delta
	Xⁿ\left(i\right)

	n
\Pi
	\rho,\delta

\rho
	Xⁿ\left(m\right)

\Pi_\rho,\deltaⁿ\right\}\right\}\right]^1/2

\leq2\left[

E
	Xⁿ

\left\{

	1
	M

\sum_mTr\left\{ \left(

I-\Pi

\rho		,\delta
	Xⁿ\left(m\right)

\right)

	n
\Pi
	\rho ,\delta

\rho
	Xⁿ\left(m\right)

	n
\Pi
	\rho,\delta

\right\} +\sum_{i ≠}Tr\left\{

\Pi

\rho		,\delta
	Xⁿ\left(i\right)

	n
\Pi
	\rho,\delta

\rho
	Xⁿ\left(m\right)

\Pi_\rho,\deltaⁿ\right\}\right\}\right]^1/2,

where the second bound follows by summing over all of the codewords not equalto the

m^th

codeword (this sum can only be larger).

We now focus exclusively on showing that the term inside the square root canbe made small. Consider the first term:

E
	Xⁿ

\left\{

	1
	M

\sum_mTr\left\{

\left(I-\Pi

\rho		,\delta
	Xⁿ\left(m\right)

\right)\Pi_\rho,\deltaⁿ

\rho
	Xⁿ\left(m\right)

	n
\Pi
	\rho,\delta

\right\}\right\}

\leqE
	Xⁿ

\left\{

	1
	M

\sum_mTr\left\{

\left(I-\Pi

\rho		,\delta
	Xⁿ\left(m\right)

\right)

\rho
	Xⁿ\left(m\right)

\right\}+\left\Vert

\rho
	Xⁿ\left(m\right)

	n
-\Pi
	\rho,\delta

\rho
	Xⁿ\left(m\right)

\Pi_\rho,\deltaⁿ\right\Vert₁\right\}

\leq\epsilon+2\sqrt{\epsilon}.

where the first inequality follows from and thesecond inequality follows from the gentle operator lemma and theproperties of unconditional and conditional typicality. Consider now thesecond term and the following chain of inequalities:

\sum_{i ≠}

E
	Xⁿ

\left\{Tr\left\{

\Pi

\rho		,\delta
	Xⁿ\left(i\right)

	n
\Pi
	\rho,\delta

\rho
	Xⁿ\left(m\right)

	n
\Pi
	\rho,\delta

\right\}\right\}

=\sum_{i ≠}Tr\left\{

E
	Xⁿ

\left\{

\Pi

\rho		,\delta
	Xⁿ\left(i\right)

\right\} \Pi_\rho,\deltaⁿ

E
	Xⁿ

\left\{

\rho
	Xⁿ\left(m\right)

	n
\right\} \Pi
	\rho,\delta

\right\}

=\sum_{i ≠}Tr\left\{

E
	Xⁿ

\left\{

\Pi

\rho		,\delta
	Xⁿ\left(i\right)

\right\} \Pi_\rho,\deltaⁿ \rho^⊗

	n
\Pi
	\rho,\delta

\right\}

\leq\sum_{i ≠}2^-n\left[ Tr\left\{

E
	Xⁿ

\left\{

\Pi

\rho		,\delta
	Xⁿ\left(i\right)

\right\}

	n
\Pi
	\rho,\delta

\right\}

The first equality follows because the codewords

Xⁿ\left(m\right)

and

Xⁿ\left(i\right)

are independent since they are different. The secondequality follows from . The first inequality follows from(\ref). Continuing, we have

\leq\sum_{i ≠}2^-n\left[

E
	Xⁿ

\left\{Tr\left\{

\Pi

\rho		,\delta
	Xⁿ\left(i\right)

\right\}\right\}

\leq\sum_{i ≠}2^-n\left[ 2^n\left[

=\sum_{i ≠}2^-n\left[

\leqM 2^-n\left[.

The first inequality follows from

	n
\Pi
	\rho,\delta

\leqI

and exchangingthe trace with the expectation. The second inequality follows from(\ref). The next two are straightforward.

Putting everything together, we get our final bound on the expectation of theaverage error probability:

1-E
	Xⁿ

\left\{

	1
	M

\sum_mTr\left\{

\Pi

\rho		,\delta
	Xⁿ\left(m\right)

\hat{\Pi}

\rho		,\delta
	Xⁿ\left(m-1\right)

… \hat{\Pi}

\rho		,\delta
	Xⁿ\left(1\right)

	n
\Pi
	\rho,\delta

\rho
	Xⁿ\left(m\right)

	n
\Pi
	\rho,\delta

\hat{\Pi}

\rho		,\delta
	Xⁿ\left(1\right)

… \hat{\Pi}

\rho		,\delta
	Xⁿ\left(m-1\right)

\Pi

\rho		,\delta
	Xⁿ\left(m\right)

\right\}\right\}

\leq\epsilon+2\left[\left(\epsilon+2\sqrt{\epsilon}\right) +M 2^-n\left[\right]^1/2.

Thus, as long as we choose

M=2^n\left[

, there exists a code with vanishing error probability.

References

.
.
- .
.

Classical capacity explained

Achievability using sequential decoding

Review of quantum mechanics

Quantum typicality

Gentle operator lemma

Non-commutative union bound

HSW theorem with the non-commutative union bound

See also

References