0% found this document useful (0 votes)

48 views89 pages

Data Transmission and Channel Capacity

Chapter 4 discusses data transmission and channel capacity, focusing on reliable transmission methods that minimize errors in communication over noisy channels. It defines discrete memoryless channels and various types of channels, including binary symmetric and erasure channels, along with their transition matrices. The chapter also introduces fixed-length data transmission codes and the average probability of error associated with these codes.

Uploaded by

李峻毅

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views89 pages

Data Transmission and Channel Capacity

Uploaded by

李峻毅

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Chapter 4

Data Transmission and Channel Capacity

Po-Ning Chen, Professor

Institute of Communications Engineering

National Chiao Tung University

Hsin Chu, Taiwan 30010, R.O.C.

Principle of Data Transmission I: 4-1

• Data transmission
– To carefully select codewords from the set of channel input words (of a given
length) so that a minimal ambiguity is obtained at the channel receiver.
• E.g., to transmit binary message through the following channel.
00 1 -* 0
1
01

10 1 -
* 1
1
11

Code of (00 for event A, 10 for event B) obviously induces less ambiguity at
the receiver than the code of (00 for event A, 01 for event B).
Reliable Transmission I: 4-2

• Deﬁnition of “reliable” transmission

– The message can be transmitted with arbitrarily small error.
• Objective of data transmission
– To transform a noisy channel into a reliable medium for sending messages
and recovering them at the receiver.
• How?
– By taking advantage of the common parts between the sender and the
receiver sites that are least aﬀected by the channel noise.
– We will see that these common parts are probabilistically captured by
the mutual information between the channel input and the channel
output.
Notations I: 4-3

W - Channel Xn - Channel Yn - Channel Ŵ -

Encoder PY n|X n (·|·) Decoder

• A data transmission system, where

– W represents the message for transmission,
– X n = (X1, . . . , Xn) denotes the codeword corresponding to the channel
input symbol W ,
– Y n = (Y1, . . . , Yn) represents the received vector due to channel input X n,
– Ŵ denotes the reconstructed messages from Y n.
Query? I: 4-4

• What is the maximum amount of information (per channel input) that can be
reliably transmitted via a given noisy channel?
– E.g. We can transmit 1 bit per channel usage by the following code.
00 1 -
* 0
1
01

10 1 -
* 1
1
11
Code = (00 for event A, 10 for event B)
Discrete memoryless channels I: 4-5

Deﬁnition 4.1 (Discrete channel) A discrete communication channel is char-

acterized by
• A ﬁnite input alphabet X .
• A ﬁnite output alphabet Y.
• A sequence of n-dimensional transition distributions
{PY n|X n (y n |xn)}∞
n=1

such that
PY n|X n (y n |xn) = 1
y n ∈Y n

for every xn ∈ X n , where xn = (x1, · · · , xn ) ∈ X n and y n = (y1, · · · , yn ) ∈

for every xi, y i , PXi+1|X i and i = 1, 2, · · · .

Discrete memoryless channels I: 4-6

Deﬁnition 4.2 (Discrete memoryless channel) A discrete memoryless chan-

for every n = 1, 2, · · · , xn ∈ X n and y n ∈ Y n. In other words, a DMC is

fully described by the channel’s transition distribution matrix Q := [px,y ] of size
|X | × |Y|, where
px,y := PY |X (y|x)
for x ∈ X , y ∈ Y. Furthermore, the matrix
Q is stochastic; i.e., the sum
of the

entries in each of its rows is equal to 1 since y∈Y px,y = 1 for all x ∈ X .
Frequently used channels I: 4-7

1. Identity (noiseless) channels: An identity channel has equal-size input and

output alphabets (|X | = |Y|) and channel transition probability satisfying

1 if y = x
PY |X (y|x) =
0 if y = x.
This is a noiseless or perfect channel as the channel input is received error-free
at the channel output.
Frequently used channels I: 4-8

X PY |X Y
1−ε -
0 * 0

-
j
1 1
1−ε

2. Binary symmetric channels (BSC):

• ε ∈ [0, 1] is called the channel’s crossover probability or bit error rate.
• The channel’s transition distribution matrix is given by

p0,0 p0,1
Q = [px,y ] =
p1,0 p1,1

PY |X (0|0) PY |X (1|0) 1−ε ε
= = (4.2.4)
PY |X (0|1) PY |X (1|1) ε 1−ε

• ε = 0 reduces the BSC to the binary identity (noiseless) channel.

Frequently used channels I: 4-9

• BSC can be explicitly represented via a binary modulo-2 additive noise

channel whose output at time i is the modulo-2 sum of its input and noise
variables:
Yi = Xi ⊕ Zi for i = 1, 2, · · ·
where


 ⊕ denotes addition modulo-2,



 Yi, Xi and Zi are the channel output, input and noise, respectively,



the alphabets X = Y = Z = {0, 1} are all binary,
.

 Xi ⊥ Zj for any i, j = 1, 2, · · · , and





 the noise process is a Bernoulli(ε) process


– i.e., a binary i.i.d. process with Pr[Z = 1] = ε.
Frequently used channels I: 4-10

X PY |X Y
1−α -
0 0
α
z
: E
α
-
1 1
1−α

3. Binary erasure channels (BEC):

• In BEC, the receiver knows the exact location of the “error” bits in the
received bitstream or codeword, but not their actual value.
• These “error” bits are then declared as “erased” during transmission and
are called “erasures.”
• The channel transition matrix is given by

p0,0 p0,E p0,1
Q = [px,y ] =
p1,0 p1,E p1,1

PY |X (0|0) PY |X (E|0) PY |X (1|0) 1−α α 0
= =
PY |X (0|1) PY |X (E|1) PY |X (1|1) 0 α 1−α
where 0 ≤ α ≤ 1 is called the channel’s erasure probability.
Frequently used channels I: 4-11

X PY |X Y
1−ε−α -
0 * 0
ε α
z
: E
ε α
-
j
1 1
1−ε−α

4. Binary symmetric erasure channel (BSEC):

• One can combine the BSC with the BEC to obtain a binary channel with
both errors and erasures.
• The channel’s transition matrix is given by

p0,0 p0,E p0,1 1−ε−α α ε
Q = [px,y ] = = (4.2.8)
p1,0 p1,E p1,1 ε α 1−ε−α
where ε, α ∈ [0, 1] are the channel’s crossover and erasure probabilities,
respectively.
• Clearly, setting α = 0 reduces the BSEC to the BSC, and setting ε = 0
reduces the BSEC to the BEC.
Frequently used channels I: 4-12

• More generally, the channel needs not have a symmetric property in the
sense of having identical transition distributions when inputs bits 0 or 1 are
sent. For example, the channel’s transition matrix can be given by

p0,0 p0,E p0,1 1−ε−α α ε
Q = [px,y ] = = (4.2.10)
p1,0 p1,E p1,1 ε α 1 − ε − α
where the probabilities ε = ε and α = α in general. We call such channel,
an asymmetric channel with errors and erasures.
Frequently used channels I: 4-13

• Similar to the BSC, the q-ary symmetric channel can be expressed as a

modulo-q additive noise channel with common input, output and noise
alphabets X = Y = Z = {0, 1, · · · , q − 1} and whose output Yi at time i
is given by
Yi = Xi ⊕q Zi,
for i = 1, 2, · · · , where ⊕q denotes addition modulo-q, and Xi and Zi are
the channel’s input and noise variables, respectively, at time i.
• Here, the noise process {Zn}∞
n=1 is assumed to be an i.i.d. process with
distribution
ε
Pr[Z = 0] = 1 − ε and Pr[Z = a] = ∀a ∈ {1, · · · , q − 1}.
q−1
It is also assumed that the input and noise processes are independent from
each other.
Frequently used channels I: 4-15

6. q-ary erasure channels:

• Given an integer q ≥ 2, one can also consider a non-binary extension of
the BEC, yielding the so called q-ary erasure channel. Speciﬁcally, this
channel has input and output alphabets given by X = {0, 1, · · · , q − 1}
and Y = {0, 1, · · · , q − 1, E}, respectively, where E denotes an erasure,
and channel transition distribution given by


1 − α if y = x, x ∈ X

PY |X (y|x) = α if y = E, x ∈ X (4.2.12)


0 if y = x, x ∈ X

where 0 ≤ α ≤ 1 is the erasure probability.

• As expected, setting q = 2 reduces the channel to the BEC.
4.3 Block codes for data transmission over DMCs I: 4-16

W - Channel Xn - Channel Yn - Channel Ŵ -

Encoder PY n|X n (·|·) Decoder

Deﬁnition 4.4 (Fixed-length data transmission code) Given positive in-

tegers n and M , and a discrete channel with input alphabet X and output alpha-
bet Y, a ﬁxed-length data transmission code (or block code) for this channel with
blocklength n and rate n1 log2 M message bits per channel symbol (or channel use)
is denoted by ∼Cn = (n, M ) and consists of:
1. M information messages intended for transmission.
2. An encoding function
f : {1, 2, . . . , M } → X n
yielding codewords f (1), f (2), · · · , f (M ) ∈ X n , each of length n. The set of
these M codewords is called the codebook and we also usually write ∼Cn =
{f (1), f (2), · · · , f (M )} to list the codewords.
3. A decoding function g : Y n → {1, 2, . . . , M }.
4.3 Block codes for data transmission over DMCs I: 4-17

Deﬁnition 4.5 (Average probability of error) The average probability of

error for a channel block code ∼Cn = (n, M ) code with encoder f (·) and decoder
g(·) used over a channel with transition distribution PY n|X n is deﬁned as
M
1
Pe(∼Cn ):= λw (∼Cn),
M w=1

where
λw (∼Cn) := Pr[Ŵ = W |W = w] = Pr[g(Y n) = w|X n = f (w)]

= PY n|X n (y n |f (w))
y n ∈Y n : g(y n)=w

is the code’s conditional probability of decoding error given that message w is sent
over the channel.
4.3 Block codes for data transmission over DMCs I: 4-18

Observation 4.6 Another more conservative error criterion is the so-called max-
imal probability of error
λ(∼Cn ):= max λw (∼Cn).
w∈{1,2,··· ,M}

Clearly,
Pe(∼Cn = (n, M )) ≤ λ(∼Cn = (n, M ));
However,
2 × Pe(∼Cn = (n, M )) ≥ λ(∼Cn = (n, M/2)),
where ∼Cn is constructed by throwing away from ∼Cn half of its codewords with
largest conditional probability of error λw (∼Cn ).
So
1
λ(∼Cn ) ≤ Pe(∼Cn) ≤ λ(∼Cn )
2
with code rates
1 1 1
R = log2(M ) and R = log2(M/2) = R − .
n n n
Consequently, a reliable transmission rate R under the average probability of error
criterion is also a reliable transmission rate under the maximal probability of error
criterion.
4.3 Block codes for data transmission over DMCs I: 4-19

Deﬁnition 4.7 (Jointly typical set) The set Fn (δ) of jointly δ-typical n-tuple
pairs (xn, y n ) with respect to the memoryless distribution
n

PX n,Y n (xn, y n ) = PX,Y (xi, yi )
i=1

is deﬁned by

Fn (δ) := (xn, y n ) ∈ X n × Y n :

1
− log2 PX n (xn) − H(X) < δ,
n

1
− log2 PY n (y n ) − H(Y ) < δ,
n

1
and − log2 PX n,Y n (xn, y n ) − H(X, Y ) < δ .
n
In short, a pair (xn, y n ) generated by independently drawing n times under PX,Y is
jointly δ-typical if its joint and marginal empirical entropies are respectively δ-close
to the true joint and marginal entropies.
4.3 Block codes for data transmission over DMCs I: 4-20

Theorem 4.8 (Joint AEP) If (X1, Y1), (X2, Y2), . . ., (Xn, Yn), . . . are i.i.d.,
i.e., {(Xi, Yi)}∞
i=1 is a dependent pair of DMSs, then

1
− log2 PX n (X1, X2, . . . , Xn) → H(X) in probability,
n
1
− log2 PY n (Y1, Y2, . . . , Yn) → H(Y ) in probability,
n
and
1
− log2 PX n,Y n ((X1, Y1), . . . , (Xn, Yn)) → H(X, Y ) in probability
n
as n → ∞.

Proof: By the weak law of large numbers, we have the desired result. 2
4.3 Block codes for data transmission over DMCs I: 4-21

Theorem 4.9 (Shannon-McMillan-Breiman theorem for pairs) Given

a dependent pair of DMSs with joint entropy H(X, Y ) and any δ greater than zero,
we can choose n big enough so that the jointly δ-typical set satisfies:
1. PX n,Y n (Fnc (δ)) < δ for sufficiently large n.
2. The number of elements in Fn (δ) is at least (1 − δ)2n(H(X,Y )−δ) for sufficiently
large n, and at most 2n(H(X,Y )+δ) for every n.
3. If (xn, y n ) ∈ Fn (δ), its probability of occurrence satisfies
2−n(H(X,Y )+δ) < PX n,Y n (xn, y n ) < 2−n(H(X,Y )−δ).

Proof: The proof is quite similar to that of the Shannon-McMillan-Breiman the-

orem for a single memoryless source presented in the previous chapter; we hence
leave it as an exercise. 2
4.3 Block codes for data transmission over DMCs I: 4-22

Deﬁnition 4.10 (Operational capacity) A rate R is said to be achievable

for a discrete channel if there exists a sequence of (n, Mn) channel codes ∼Cn with
1
lim inf log2 Mn ≥ R and lim Pe(∼Cn) = 0.
n→∞ n n→∞

The channel’s operational capacity, Cop, is the supremum of all achievable rates:
Cop = sup{R : R is achievable}.

• The next theorem shows Cop = C, i.e., the information capacity is equal
to the operational capacity.
4.3 Block codes for data transmission over DMCs I: 4-23

Theorem 4.11 (Shannon’s channel coding theorem) Consider a DMC

with finite input alphabet X , finite output alphabet Y and transition distribu-
tion probability PY |X (y|x), x ∈ X and y ∈ Y. Define the channel capacity (or
information capacity)
C := max I(X; Y ) = max I(PX , PY |X )
PX PX

where the maximum is taken over all input distributions PX . Then the following
hold.

• Forward part (achievability): For any 0 < ε < 1, there exist γ > 0 and a
sequence of data transmission block codes {∼Cn = (n, Mn)}∞
n=1 with
1
C> lim inf log Mn ≥ C − γ
n→∞ n 2
and
Pe(∼Cn) < ε for suﬃciently large n,
where Pe(∼Cn) denotes the (average) probability of error for block code ∼Cn .
4.3 Block codes for data transmission over DMCs I: 4-24

• Converse part: For any 0 < ε < 1, any sequence of data transmission block
codes {∼Cn = (n, Mn)}∞
n=1 with

1
lim inf log Mn > C
n→∞ n 2
satisfies
Pe(∼Cn) > (1 − )µ for sufficiently large n, (4.3.1)
where
C
µ=1− > 0,
lim inf n→∞ n1 log2 Mn
i.e., the codes’ probability of error is bounded away from zero for all n suffi-
ciently large.

Notes:
• (4.3.1) actually implies that
lim inf Pe(∼Cn) ≥ lim(1 − )µ = µ,
n→∞ ↓0

where the error probability lower bound is nothing to do with . Here we state
the converse of Theorem 4.11 in a form in parallel to the converse statements
in Theorems 3.6 and 3.15.
4.3 Block codes for data transmission over DMCs I: 4-25

Such an expression is more suitable for calculating the channel capacity.

• Channel capacity C is well-defined
– since for a fixed PY |X , I(PX , PY |X ) is concave and continuous in PX
with respect to both the variational distance and the Euclidean distance
(i.e., L2-distance) [415, Chapter 2], and
– since the set of all input distributions PX is a compact (closed and bounded)
subset of R|X | due to the finiteness of X .
For the above two reasons, there must exist a PX that achieves the supremum
of the mutual information and the maximum is attainable.
4.3 Block codes for data transmission over DMCs I: 4-26

Idea behind the proof of the forward part:

• It suﬃces to prove the existence of a good block code sequence, satisfying the
rate condition,
1
lim inf log2 Mn ≥ C − γ
n→∞ n
for some γ > 0, whose average error probability is ultimately less than ε.
• Random coding argument:
– The desired good block code sequence is not deterministically con-
structed;
– instead, its existence is implicitly proven by showing that for a class (en-
semble) of block code sequences {∼Cn }∞ n=1 and a code-selecting distribution
Pr[∼Cn] over these block code sequences, the expectation value of the av-
erage error probability, evaluated under the code-selecting distribution on
these block code sequences, can be made smaller than ε for n suﬃciently
large:
E∼Cn [Pe(∼Cn)] = Pr[∼Cn]Pe(∼Cn) → 0 as n → ∞.
∼Cn

– Hence, there must exist at least one such a desired good code sequence
{∼Cn∗}∞
n=1 among them (with Pe(∼ Cn∗) → 0 as n → ∞).
4.3 Block codes for data transmission over DMCs I: 4-27

Proof of the forward part:

• Since the forward part holds trivially when C = 0 by setting Mn = 1, we
assume in the sequel that C > 0.
• Fix ε ∈ (0, 1) and some γ with 0 < γ < min{4ε, C}.
• Observe that there exists N0 such that for n > N0, we can choose an integer
Mn with
γ 1
C − ≥ log2 Mn > C − γ. (4.3.2)
2 n
(Since we are only concerned with the case of “sufficient large n,” it suffices to
consider only those n’s satisfying n > N0, and ignore those n’s for n ≤ N0 .)
• Define δ := γ/8.
4.3 Block codes for data transmission over DMCs I: 4-28

• Let PX̂ be the probability distribution achieving the channel capacity:

C:= max I(PX , PY |X ) = I(PX̂ , PY |X ).
PX

Denote by PŶ n the channel output distribution due to channel input product

distribution PX̂ n with PX̂ n (xn) = ni=1 PX̂ (xi); in other words,

n
PŶ n (y ) = PX̂ n,Ŷ n (xn, y n )
xn ∈X n

and
PX̂ n,Ŷ n (xn, y n ):=PX̂ n (xn)PY n|X n (y n |xn)
for all xn ∈ X n and y n ∈ Y n.

– Note that since PX̂ n (xn) = ni=1 PX̂ (xi ) and the channel is memoryless,
the resulting joint input-output process {(X̂i, Ŷi)}∞ i=1 is also memoryless
with n

n n
PX̂ n,Ŷ n (x , y ) = PX̂,Ŷ (xi, yi )
i=1
and
PX̂,Ŷ (x, y) = PX̂ (x)PY |X (y|x) for x ∈ X , y ∈ Y.

We next present the proof in three steps.

4.3 Block codes for data transmission over DMCs I: 4-29

Step 1: Code construction.

• For any blocklength n, independently select Mn channel inputs with re-
placement from X n according to the distribution PX̂ n (xn).
• For the selected Mn channel inputs yielding codebook
∼Cn:={c1, c2, . . . , cMn },
deﬁne the encoder fn(·) and decoder gn (·), respectively, as follows:
fn (m) = cm for 1 ≤ m ≤ Mn ,
and


 m, if cm is the only codeword in ∼Cn


satisfying (cm , y n ) ∈ Fn (δ);
gn (y n) =



 any one in {1, 2, . . . , M }, otherwise,
n

where Fn(δ) is deﬁned in Deﬁnition 4.7 with respect to distribution PX̂ n,Ŷ n .
(We assume that the codebook ∼Cn and the channel distribution PY |X are
known at both the encoder and the decoder.)
4.3 Block codes for data transmission over DMCs I: 4-30

Fn (δ) := (xn, y n ) ∈ X n × Y n :

1 1
− log2 PX n (xn) − H(X) < δ, − log2 PY n (y n ) − H(Y ) < δ,
n n

1
and − log2 PX n,Y n (xn, y n ) − H(X, Y ) < δ .
n

• Again, let me repeat the encoding and decoding process here!

– A message W is chosen according to the uniform distribution from the
set of messages.
– The encoder fn then transmits the W th codeword cW in ∼Cn over the
channel.
– Then Y n is received at the channel output and the decoder guesses the
sent message via Ŵ = gn(Y n).
– Note that there is a total |X |nMn possible randomly generated codebooks
∼Cn and the probability of selecting each codebook is given by
Mn

Pr[∼Cn] = PX̂ n (cm).
m=1
4.3 Block codes for data transmission over DMCs I: 4-31

Step 2: Conditional error probability.

• For each (randomly generated) data transmission code ∼Cn , the conditional
probability of error given that message m was sent, λm (∼Cn), can be upper
bounded by:

λm(∼Cn) ≤ PY n|X n (y n |cm )
y n ∈Y n : (cm ,y n )∈Fn (δ)
Mn

+ PY n|X n (y n|cm), (4.3.3)
m =1 y n ∈Y n : (cm ,y n )∈Fn (δ)
m =m

where
– the ﬁrst term in (4.3.3) considers the case that the received channel
output y n is not jointly δ-typical with cm , (and hence, the decoding rule
gn (·) would possibly result in a wrong guess), and
– the second term in (4.3.3) reﬂects the situation when y n is jointly δ-
typical not only with the transmitted codeword cm, but also with an-
other codeword cm (which may cause a decoding error).
4.3 Block codes for data transmission over DMCs I: 4-32

• By taking expectation in (4.3.3) with respect to the mth codeword-selecting

distribution PX̂ n (cm), we obtain

PX̂ n (cm)λm (∼Cn) ≤ PX̂ n (cm )PY n|X n (y n |cm )
cm ∈X n cm ∈X n y n ∈Fn (δ|cm )
Mn

+ PX̂ n (cm)PY n|X n (y n|cm)
cm ∈X n m =1 y n ∈Fn (δ|cm )
m =m
= PX̂ n,Ŷ n (Fnc (δ))
Mn
+ PX̂ n,Ŷ n (cm , y n ),
m =1 cm ∈X n y n ∈Fn (δ|cm )
m =m
(4.3.4)
where
Fn(δ|xn ):= {y n ∈ Y n : (xn, y n ) ∈ Fn(δ)} .
4.3 Block codes for data transmission over DMCs I: 4-33

Step 3: Average error probability.

E∼Cn [Pe(∼Cn )] = Pr[∼Cn]Pe(∼Cn)
∼Cn
Mn

1
= ··· PX̂ n (c1) · · · PX̂ n (cMn ) λm(∼Cn)
Mn m=1
c1 ∈X n cMn ∈X n
Mn
1
= ··· ···
Mn m=1 n c1 ∈X cm−1 ∈X n cm+1 ∈X n cMn ∈X n
PX̂ n (c1) · · · PX̂ n (cm−1 )PX̂ n (cm+1) · · · PX̂ n (cMn )
 

×  PX̂ n (cm)λm(∼Cn)
cm ∈X n
4.3 Block codes for data transmission over DMCs I: 4-34

Mn
1
≤ ··· ···
Mn m=1 n c1 ∈X cm−1 ∈X n cm+1 ∈X n cMn ∈X n
PX̂ n (c1) · · · PX̂ n (cm−1 )PX̂ n (cm+1) · · · PX̂ n (cMn )
×PX̂ n,Ŷ n (Fnc (δ))
Mn
1
+ ··· ···
Mn m=1 n c1 ∈X cm−1 ∈X n cm+1 ∈X n cMn ∈X n
PX̂ n (c1) · · · PX̂ n (cm−1 )PX̂ n (cm+1) · · · PX̂ n (cMn )
Mn
× PX̂ n,Ŷ n (cm , y n ) (4.3.5)
m =1 cm ∈X n y n ∈Fn (δ|cm )
m =m
4.3 Block codes for data transmission over DMCs I: 4-35

= PX̂ n,Ŷ n (Fnc (δ))



 
Mn   Mn
1 

+ ··· ···
Mn m=1  

 m =1 c1∈X
n cm−1 ∈X n cm+1 ∈X n cMn ∈X n
m =m
PX̂ n (c1) · · · PX̂ n (cm−1 )PX̂ n (cm+1 ) · · · PX̂ n (cMn )


n 
× PX̂ n,Ŷ n (cm, y ) ,
n n

cm ∈X y ∈Fn (δ|cm )

where (4.3.5) follows from (4.3.4), and the last step holds since PX̂ n,Ŷ n (Fnc (δ))
is a constant independent of c1, . . ., cMn and m.
4.3 Block codes for data transmission over DMCs I: 4-36


Mn

(Then for n > N0 )  ··· ···
m =1 c1 ∈X n cm−1 ∈X n cm+1 ∈X n cMn ∈X n
m =m
PX̂ n (c1) · · · PX̂ n (cm−1)PX̂ n (cm+1) · · · PX̂ n (cMn )


× P n n (cm , y n )
X̂ ,Ŷ
cm ∈X n y n ∈F n (δ|cm )
 
Mn

=  PX̂ n (cm )PX̂ n,Ŷ n (cm , y n )
m =1 cm ∈X n cm ∈X n y n ∈Fn (δ|cm )
m =m
  
Mn

=  PX̂ n (cm )  PX̂ n,Ŷ n (cm, y n )
m =1 cm ∈X n y n ∈Fn (δ|cm ) cm ∈X n
m =m
 
Mn

=  PX̂ n (cm )PŶ n (y n )
m =1 cm ∈X n y n ∈Fn (δ|cm )
m =m
4.3 Block codes for data transmission over DMCs I: 4-37

 
Mn

=  PX̂ n (cm )PŶ n (y n )
m =1 (cm ,y n )∈Fn (δ)
m =m
Mn

≤ |Fn(δ)|2−n(H(X̂)−δ)2−n(H(Ŷ )−δ)
m =1
m =m
Mn

≤ 2n(H(X̂,Ŷ )+δ)2−n(H(X̂)−δ)2−n(H(Ŷ )−δ)
m =1
m =m

= (Mn − 1)2n(H(X̂,Ŷ )+δ)2−n(H(X̂)−δ) 2−n(H(Ŷ )−δ)

< Mn · 2n(H(X̂,Ŷ )+δ)2−n(H(X̂)−δ) 2−n(H(Ŷ )−δ)
≤ 2n(C−4δ) · 2−n(I(X̂;Ŷ )−3δ) = 2−nδ ,


 the 1st inequality follows from the deﬁnition of the jointly typical set Fn(δ),


the 2nd inequality holds by the Shannon-McMillan-Breiman theorem for pairs (Theorem 4
where

 since C = I(X̂; Ŷ ) by deﬁnition of X̂ and Ŷ , and

 the last inequality follows

since (1/n) log2 Mn ≤ C − (γ/2) = C − 4δ.
4.3 Block codes for data transmission over DMCs I: 4-38

Consequently,
E∼Cn [Pe(∼Cn )] ≤ PX̂ n,Ŷ n (Fnc (δ)) + 2−nδ ,
which for suﬃciently large n (and n > N0), can be made smaller than 2δ =
γ/4 < ε by the Shannon-McMillan-Breiman theorem for pairs. 2
Fano’s inequality I: 4-39

Relation between Fano’s inequality and converse proof:

• Consider an (n, Mn) channel block code ∼Cn with encoding and decoding func-
tions given respectively by
fn : {1, 2, · · · , Mn } → X n
and
gn : Y n → {1, 2, · · · , Mn}.
• Let message W , which is uniformly distributed over the set of messages {1, 2, · · · , Mn},
be sent via codeword X n(W ) = fn(W ) over the DMC.
• Let Y n be received at the channel output.
• At the receiver, the decoder estimates the sent message via Ŵ = gn (Y n).
• The probability of estimation error is given by the code’s average error proba-
bility:
Pr[W = Ŵ ] = Pe(∼Cn).
• Then Fano’s inequality yields
H(W |Y n) ≤ 1 + Pe(∼Cn) log2(Mn − 1)
< 1 + Pe(∼Cn) log2 Mn. (4.3.6)
4.3 Block codes for data transmission over DMCs I: 4-40

Proof of the converse part:

• For any (n, Mn) block channel code ∼Cn as described above, we have that
W → Xn → Y n
form a Markov chain; we thus obtain by the data processing inequality that
I(W ; Y n ) ≤ I(X n; Y n ). (4.3.7)

• We can also upper bound I(X n; Y n ) in terms of the channel capacity C as

follows
I(X n ; Y n) ≤ max I(X n; Y n)
PX n
n

≤ max I(Xi; Yi) (by Theorem 2.21: Bounds on mutual information)
PX n
i=1
n

≤ max I(Xi; Yi)
PX n
i=1
n
= max I(Xi; Yi) = nC. (4.3.8)
PX i
i=1
4.3 Block codes for data transmission over DMCs I: 4-41

• Consequently, code ∼Cn satisﬁes the following:

log2 Mn = H(W ) (since W is uniformly distributed)
= H(W |Y n) + I(W ; Y n)
≤ H(W |Y n) + I(X n; Y n) (by (4.3.7))
≤ H(W |Y n) + nC (by (4.3.8))
< 1 + Pe(∼Cn) · log2 Mn + nC. (by (4.3.6))
• This implies that
C 1 C + 1/n
Pe(∼Cn) > 1 − − =1− .
(1/n) log2 Mn log2 Mn (1/n) log2 Mn
• So if
1 C
lim inf log2 Mn = ,
n→∞ n 1−µ
then for any 0 < ε < 1, there exists an integer N such that for n ≥ N ,
1 C + 1/n
log2 Mn ≥ , (4.3.9)
n 1 − (1 − ε)µ
because, otherwise, (4.3.9) would be violated for inﬁnitely many n, implying a
contradiction that
1 C + 1/n C
lim inf log2 Mn ≤ lim inf = .
n→∞ n n→∞ 1 − (1 − ε)µ 1 − (1 − ε)µ
4.3 Block codes for data transmission over DMCs I: 4-42

• Hence, for n ≥ N ,
C + 1/n
Pe(∼Cn) > 1 − [1 − (1 − ε)µ] = (1 − )µ > 0;
C + 1/n
i.e., Pe(∼Cn ) is bounded away from zero for n suﬃciently large. 2

• Converse part: For any 0 < ε < 1, any sequence of data transmission
block codes {∼Cn = (n, Mn)}∞
n=1 with

1
R = lim inf log Mn > C
n→∞ n 2
satisfies
Pe(∼Cn ) > (1 − )µ for sufficiently large n,
where
C C
µ=1− = 1− > 0,
lim inf n→∞ n1 log2 Mn R
i.e., the codes’ probability of error is bounded away from zero for all n
sufficiently large.
4.3 Block codes for data transmission over DMCs I: 4-43

When R > C, Pe(Cn) > (1 − ε)µ is bounded

away from 0 for n suﬃciently large.

C
µ=1−
R

0
0 C
4.3 Block codes for data transmission over DMCs I: 4-44

Observation 4.12

• The results of the above channel coding theorem is illustrated in the ﬁgure
below, where
R = lim inf Rn = lim inf (1/n) log2 Mn message bits/channel use
n→∞ n→∞

is usually called the asymptotic coding rate of channel block codes, and Rn is
the code rate for codes of blocklength n.

lim supn→∞ Pe(Cn) = 0 lim supn→∞ Pe(Cn) > 0

for the best channel block code for all channel block codes -
C R

– Note that Theorem 4.11 actually indicates

limn→∞ Pe(Cn) = 0, for R < C;
lim inf n→∞ Pe(Cn) > 0, for R > C

– Such a “two-region” behavior however only holds for a DMC.

4.3 Block codes for data transmission over DMCs I: 4-45

– For a more general channel, three partitions instead of two may result, i.e.,
(i) R < C, (ii) C < R < C̄ and (iii) R > C̄,
which respectively correspond to


(i) lim supn→∞ Pe(Cn) = 0 for the best block code,

(ii) lim supn→∞ Pe(Cn) > 0 but lim inf n→∞ Pe = 0 for the best block code, and


(iii) lim inf n→∞ Pe(Cn) > 0 for all channel code codes,

where C̄ is named the optimistic channel capacity.

– Since C̄ = C for DMCs, the three regions are thus reduced to two.
4.5 Calculating channel capacity I: 4-46

4.5.1 Symmetric, Weakly Symmetric, and Quasi-symmetric Channels

Definition 4.15
• A DMC with finite input alphabet X , finite output alphabet Y and channel
transition matrix Q = [px,y ] of size |X |×|Y| is said to be symmetric if the rows
of Q are permutations of each other and the columns of Q are permutations of
each other.
• The channel is said to be weakly-symmetric if the rows of Q are permutations
of each other and all the column sums in Q are equal.

Example of symmetric channel: A ternary DMC channel with X = Y =

{0, 1, 2} and transition matrix
   
PY |X (0|0) PY |X (1|0) PY |X (2|0) 0.4 0.1 0.5
Q = PY |X (0|1) PY |X (1|1) PY |X (2|1) = 0.5 0.4 0.1 .
PY |X (0|2) PY |X (1|2) PY |X (2|2) 0.1 0.5 0.4
4.5 Calculating channel capacity I: 4-47

Example of weakly symmetric but non-symmetric channel: A quadra-

try DMC with |X | = |Y| = 4 and
 
0.5 0.25 0.25 0
 
0.5 0.25 0.25 0 
Q=  (4.5.1)
 0 0.25 0.25 0.5
0 0.25 0.25 0.5

is weakly-symmetric (but not symmetric).

Lemma 4.16 The capacity of a weakly-symmetric channel Q is achieved by a

uniform input distribution and is given by
C = log2 |Y| − H(q1, q2 , · · · , q|Y|) (4.5.3)
where (q1, q2, · · · , q|Y| ) denotes any row of Q and
|Y|

H(q1 , q2, · · · , q|Y|):= − qi log2 qi
i=1

is the row entropy.

4.5 Calculating channel capacity I: 4-48

Proof:
• The mutual information between the channel’s input and output is given by
I(X; Y ) = H(Y ) − H(Y |X)

= H(Y ) − PX (x)H(Y |X = x)
x∈X

• Thus

H(Y |X) = PX (x)H(q1, q2, · · · , q|Y| )
x∈X

= H(q1 , q2, · · · , q|Y| ) PX (x)
x∈X
= H(q1 , q2, · · · , q|Y| ).
This implies
I(X; Y ) = H(Y ) − H(q1, q2, · · · , q|Y| )
≤ log2 |Y| − H(q1 , q2, · · · , q|Y| )
with equality achieved iﬀ Y is uniformly distributed over Y.
• The proof is completed by conﬁrming that for a weakly symmetric channel, the
uniform input distribution induces the uniform output distribution (see the
text). 2
4.5 Calculating channel capacity I: 4-50

Example 4.18 (Capacity of the BSC) Since the BSC with crossover proba-
bility (or bit error rate) ε is symmetric, we directly obtain from Lemma 4.16 that
its capacity is achieved by a uniform input distribution and is given by
C = log2(2) − H(1 − ε, ε) = 1 − hb(ε) (4.5.5)
where hb(·) is the binary entropy function.

Example 4.19 (Capacity of the q-ary symmetric channel) Similarly, the

q-ary symmetric channel with symbol error rate ε described in (4.2.11) is symmetric;
hence, by Lemma 4.16, its capacity is given by
!
ε ε
C = log2 q − H 1 − ε, ,··· ,
q−1 q−1
ε
= log2 q + ε log2 + (1 − ε) log2(1 − ε).
q−1
Question: Does the uniform input achieve the channel capacity iﬀ
the channel is weakly symmetric? No.
4.5 Calculating channel capacity I: 4-51

Deﬁnition 4.20 (Quasi-symmetric channels) A DMC with ﬁnite input al-

phabet X , ﬁnite output alphabet Y and channel transition matrix Q = [px,y ] of
size |X | × |Y| is said to be quasi-symmetric if Q can be partitioned along its
columns into m weakly-symmetric sub-matrices Q1, Q2, · · · , Qm for some in-
teger m ≥ 1, where each Qi sub-matrix has size |X | × |Yi| for i = 1, 2, · · · , m with
Y1 ∪ · · · ∪ Ym = Y and Yi ∩ Yj = ∅ ∀i = j, i, j = 1, 2, · · · , m.

Quasi- = “having some, but not all of the features of” such as quasi-scholar and
quasi-oﬃcial.

• The notion of “quasi-symmetry” we provide here is slightly more general than

Gallager’s notion [135, p. 94], as we herein allow each sub-matrix to be weakly-
symmetric (instead of symmetric as in [135]).
4.5 Calculating channel capacity I: 4-52

Lemma 4.21 The capacity of a quasi-symmetric channel Q is achieved by a uni-

form input distribution and is given by
m

C= ai C i (4.5.6)
i=1

where
ai := px,y = sum of any row in Qi, i = 1, · · · , m,
y∈Yi
and
" #
Ci = log2 |Yi| − H any row in the matrix 1
Q
ai i
, i = 1, · · · , m

is the capacity of the ith weakly-symmetric “sub-channel” whose transition matrix

is obtained by multiplying each entry of Qi by a1i (this normalization renders sub-
matrix Qi into a stochastic matrix and hence a channel transition matrix).
4.5 Calculating channel capacity I: 4-53

Example 4.22 (Capacity of the BEC) The BEC with erasure probability α
and transition matrix

PY |X (0|0) PY |X (E|0) PY |X (1|0) 1−α α 0
Q = =
PY |X (0|1) PY |X (E|1) PY |X (1|1) 0 α 1−α
is quasi-symmetric (but neither weakly-symmetric nor symmetric).
• Its transition matrix Q can be partitioned along its columns into two symmetric
(hence weakly-symmetric) sub-matrices

1−α 0
Q1 =
0 1−α
and
α
Q2 = .
α
4.5 Calculating channel capacity I: 4-54

• Thus applying the capacity formula for quasi-symmetric channels of Lemma 4.21
yields that the capacity of the BEC is given by
C = a1 C 1 + a2 C 2
where a1 = 1 − α, a2 = α,
!
1−α 0
C1 = log2(2) − H , = 1 − H(1, 0) = 1 − 0 = 1,
1−α 1−α
and "α#
C2 = log2(1) − H = 0 − 0 = 0.
α
Therefore, the BEC capacity is given by
C = (1 − α)(1) + (α)(0) = 1 − α. (4.5.7)
4.5 Calculating channel capacity I: 4-55

Example 4.23 (Capacity of the BSEC) Similarly, the BSEC with crossover
probability ε and erasure probability α and transition matrix

p0,0 p0,E p0,1 1−ε−α α ε
Q = [px,y ] = =
p1,0 p1,E p1,1 ε α 1−ε−α
is quasi-symmetric; its transition matrix can be partitioned along its columns into
two symmetric sub-matrices

1−ε−α ε
Q1 =
ε 1−ε−α
and
α
Q2 = .
α
Hence by Lemma 4.21, the channel capacity is given by C = a1C1 + a2C2 where
a1 = 1 − α, a2 = α,
! !
1−ε−α ε 1−ε−α
C1 = log2(2) − H , = 1 − hb ,
1−α 1−α 1−α
and "α#
C2 = log2(1) − H = 0.
α
4.5 Calculating channel capacity I: 4-56

We thus obtain that

!
1−ε−α
C = (1 − α) 1 − hb + (α)(0)
1−α
!
1−ε−α
= (1 − α) 1 − hb . (4.5.8)
1−α
4.5.2 Karuch-Kuhn-Tucker cond. for chan. capacity I: 4-57

Deﬁnition 4.24 (Mutual information for a speciﬁc input symbol) The

mutual information for a speciﬁc input symbol is deﬁned as:
PY |X (y|x)
I(x; Y ):= PY |X (y|x) log2 .
PY (y)
y∈Y

From the above deﬁnition, the mutual information becomes:

PY |X (y|x)
I(X; Y ) = PX (x) PY |X (y|x) log2
PY (y)
x∈X y∈Y

= PX (x)I(x; Y ).
x∈X
4.5.2 Karuch-Kuhn-Tucker cond. for chan. capacity I: 4-58

Lemma 4.25 (KKT condition for channel capacity) For a given DMC,
an input distribution PX achieves its channel capacity iﬀ there exists a constant C
such that
I(x : Y ) = C ∀x ∈ X with PX (x) > 0;
(4.5.9)
I(x : Y ) ≤ C ∀x ∈ X with PX (x) = 0.
Furthermore, the constant C is the channel capacity (justifying the choice of nota-
tion).

Proof: The forward (if) part holds directly; hence, we only prove the converse
(only-if) part.
• Without loss of generality, we assume that PX (x) < 1 for all x ∈ X , since
PX (x) = 1 for some x implies that I(X; Y ) = 0.
4.5.2 Karuch-Kuhn-Tucker cond. for chan. capacity I: 4-59

• The problem of calculating the channel capacity is to maximize

PY |X (y|x)
I(X; Y ) = PX (x)PY |X (y|x) log2 )P )
, (4.5.10)

x ∈X P X (x Y |X (y|x
x∈X y∈Y

subject to the condition

PX (x) = 1 (4.5.11)
x∈X
for a given channel distribution PY |X .
• By using the Lagrange multiplier method (e.g., see Appendix B.10), maximizing
(4.5.10) subject to (4.5.11) is equivalent to maximize:

PY |X (y|x)
f (PX ):= PX (x)PY |X (y|x) log2 +λ PX (x) − 1 .

x∈X PX (x )PY |X (y|x ) x∈X
y∈Y x ∈X
4.5.2 Karuch-Kuhn-Tucker cond. for chan. capacity I: 4-60

• We then take the derivative of the above quantity with respect to PX (x ), and
obtain that
∂f (PX )
= I(x ; Y ) − log2(e) + λ.
∂PX (x )

The details for taking the derivative are as follows:

∂
PX (x)PY |X (y|x) log2 PY |X (y|x)
∂PX (x ) x∈X y∈Y
$ % &

− PX (x)PY |X (y|x) log2 PX (x )PY |X (y|x) + λ PX (x) − 1
x∈X y∈Y x ∈X x∈X
$ %

= PY |X (y|x ) log2 PY |X (y|x) − PY |X (y|x) log2 PX (x )PY |X (y|x )
y∈Y y∈Y x ∈X

PY |X (y|x )
+ log2 (e) PX (x)PY |X (y|x) +λ
x ∈X PX (x )PY |X (y|x )

x∈X y∈Y
$ %
PY |X (y|x )
= I(x ; Y ) − log2 (e) PX (x)PY |X (y|x) +λ
∈X PX (x )PY |X (y|x )

y∈Y x∈X x

= I(x ; Y ) − log2 (e) PY |X (y|x ) + λ
y∈Y

= I(x ; Y ) − log2 (e) + λ.
4.5.2 Karuch-Kuhn-Tucker cond. for chan. capacity I: 4-61

• By Property 2 of Lemma 2.46, I(X; Y ) = I(PX , PY |X ) is a concave function

in PX (for a ﬁxed PY |X ). Therefore,
1. the maximum of I(PX , PY |X ) occurs for a zero derivative when PX (x) does
not lie on the boundary, namely 1 > PX (x) > 0.
2. For those PX (x) lying on the boundary, i.e., PX (x) = 0, the maximum
occurs iﬀ a displacement from the boundary to the interior decreases the
quantity, which implies a non-positive derivative, namely
I(x; Y ) ≤ −λ + log2(e), for those x with PX (x) = 0.

• To summarize, if an input distribution PX achieves the channel capacity, then

I(x ; Y ) = −λ + log2(e), for PX (x) > 0;
I(x ; Y ) ≤ −λ + log2(e), for PX (x) = 0.
for some λ.
• With the above result, setting C = −λ + 1 yields (4.5.9).
• Finally, multiplying both sides of each equation in (4.5.9) by PX (x) and sum-
ming over x yields that maxPX I(X; Y ) on the left and the constant C on the
right, thus proving that the constant C is indeed the channel’s capacity. 2
4.5.2 Karuch-Kuhn-Tucker cond. for chan. capacity I: 4-62

Question: Does the uniform input achieve the channel capacity iﬀ

the channel is quasi-symmetric? No.

Observation 4.28 (Capacity achieved by a uniform input distribu-

tion)
• T -symmetric channels [319, Section V, Deﬁnition 1]: A channel is T -symmetric
if
PY |X (y|x)
T (x) := I(x; Y ) − log2 |X | = PY |X (y|x) log2
x ∈X PY |X (y|x )

y∈Y

is a constant function of x (i.e., functionally independent of x), where I(x; Y )

is the mutual information for input x under a uniform input distribution.
• An example of a T -symmetric channel that is not quasi-symmetric is the binary-
input ternary-output channel with the following transition matrix
1 1 1
Q = 13 31 32 .
6 6 3

Its capacity is achieved by the uniform input distribution.

4.5.2 Karuch-Kuhn-Tucker cond. for chan. capacity I: 4-63

• Unlike quasi-symmetric channels, T -symmetric channels do not admit in gen-

eral a simple closed-form expression for their capacity (such as the one given
in (4.5.6)).

m

C= ai C i (4.5.6)
i=1
4.4 Example of Polar Codes for the BEC I: 4-64

• Polar coding is a new channel coding method proposed by Arikan during 2008-
2009, which can provably achieve the capacity of any binary-input memoryless
channel Q whose capacity is realized by a uniform input distribution.
• The main idea behind polar codes is channel “polarization,” which transforms
n uses of BEC(ε) into extremal “polarized” channels; i.e., channels which are
either perfect (noiseless) or completely noisy.
• It is shown that as n → ∞, the number of unpolarized channels converges to
0 and the fraction of perfect channels converges to I(X; Y ) = 1 − ε under a
uniform input, which is the capacity of the BEC (see Example 4.22 in Section
4.5).
• A polar code can then be naturally obtained by sending information bits di-
rectly through those perfect channels and sending known bits (usually called
frozen bits) through the completely noisy channels.
4.4 Example of Polar Codes for the BEC I: 4-65

U1 -⊕ X1 - BEC(ε) Y1 -
6

U2 X2 - BEC(ε) Y2 -

• We start with the simplest case (often named basic transformation) of n = 2.

• Under uniformly distributed X1 and X2, we have
I(Q):=I(X1; Y1) = I(X2; Y2) = 1 − ε.

• Now consider the following linear modulo-2 operation:

X1 = U1 ⊕ U2,
X2 = U2 ,
where U1 and U2 represent uniformly distributed independent message bits.
4.4 Example of Polar Codes for the BEC I: 4-66

U1 - ⊕ X1 - BEC(ε) Y1 -
6

U2 X2 - BEC(ε) Y2 -

• The decoder performs successive cancellation decoding as follows.

– It ﬁrst decodes U1 from the received (Y1, Y2),
– and then decodes U2 based on (Y1, Y2) and the previously decoded U1
(assuming the decoding is done correctly).
• This will create two new channels; namely the “worse” channel Q− and the
“better” channel Q+ given by
Q− : U1 → (Y1, Y2),
Q+ : U2 → (Y1, Y2, U1 ),
respectively (the names of these channels will be justiﬁed shortly).
4.4 Example of Polar Codes for the BEC I: 4-67

U1 - ⊕ U1 ⊕ U2 - BEC(ε) Y1 -
6

U2 U2 - BEC(ε) Y2 -



Y1 ⊕ Y2 , if Y1, Y2 ∈ {0, 1}


? ⊕ Y , if Y1 = E, Y2 ∈ {0, 1}
2
• Q− : U1 =

Y1 ⊕ ? , if Y1 ∈ {0, 1}, Y2 = E


? ⊕ ?, if Y1 = Y2 = E
Noting that given output E for a BEC, the receiver knows “nothing” about
the input.
• Thus, Q− is a BEC with erasure probability ε− := 1 − (1 − ε)2.
4.4 Example of Polar Codes for the BEC I: 4-68

U1 - ⊕ U1 ⊕ U2 - BEC(ε) Y1 -
6

U2 U2 - BEC(ε) Y2 -



Y1 ⊕ U1, if Y1 ∈ {0, 1}

• Q+: U2 = Y2, if Y2 ∈ {0, 1}


 ?, if Y1 = Y2 = E

• Q+ is a BEC with erasure probability ε+ := ε2.

Thus, let U1 be the frozen bit and U2 be the info bit. One can transform the
system to a BEC(ε2) with code rate 1/2 bits/channel usage.
4.4 Example of Polar Codes for the BEC I: 4-69

The channel capacity remains the same.

I(Q+) + I(Q− ) = I(U2 ; Y1, Y2, U1 ) + I(U1 ; Y1, Y2)
= (1 − ε2) + [1 − (1 − (1 − ε)2)]
= 2(1 − ε)
= 2I(Q), (4.4.1)
4.4 Example of Polar Codes for the BEC I: 4-70

• Now, let us consider the case of n = 4 and suppose we perform the basic trans-
formation twice to send (i.i.d. uniform) message bits (U1, U2 , U3, U4), yielding

Q− : V1 → (Y1, Y2), where X1 = V1 ⊕ V2 ,

Q+ : V2 → (Y1, Y2, V1), where X2 = V2 ,
Q− : V3 → (Y3, Y4), where X3 = V3 ⊕ V4 ,
Q+ : V4 → (Y3, Y4, V3), where X4 = V4 ,

where V1 = U1 ⊕ U2, V3 = U2, V2 = U3 ⊕ U4 and V4 = U4.

•

 Q −−
: U1 → (Y1, Y2, Y3, Y4) with erasure probability ε−−:=1 − (1 − ε−)2


Q+− : U → (Y , Y , Y , Y , U , U ) with erasure probability ε+−:=1 − (1 − ε+)2
3 1 2 3 4 1 2

 Q −+
: U2 → (Y1, Y2, Y3, Y4, U1 ) with erasure probability ε−+:=(ε−)2


Q++ : U → (Y , Y , Y , Y , U , U , U ) with erasure probability ε++:=(ε+)2.
4 1 2 3 4 1 3 2
4.4 Example of Polar Codes for the BEC I: 4-71

In polar coding terminology,

• the process of using multiple basic transformations to get X1, . . . , Xn from
U1 , . . . , Un (where the Ui ’s are i.i.d. uniform message random variables) is
called channel “combining”
• and that of using Y1, . . . , Yn and U1, . . . , Ui−1 to obtain Ui for i ∈ {1, . . . , n}
is called channel “splitting.”
• Altogether, the phenomenon is called channel “polarization.”

Example 4.14 Consider a BEC with erasure probability ε = 0.5 and let n = 8.
4.4 Example of Polar Codes for the BEC I: 4-72

(0.9961) (0.9375) (0.75) (0.5)

U1 -⊕ T1 -⊕ V1-⊕ X1- BEC(0.5) Y-
1
6 6 6

(0.6836) (0.4375) (0.25) (0.5)

U5 -⊕ T3- V2 X2- Y-
2
⊕ BEC(0.5)
6 6

(0.8086) (0.5625) (0.75) (0.5)

U3 -⊕ T2 V3- X3- Y-
3
⊕ BEC(0.5)
6 6

(0.1211) (0.0625) (0.25) (0.5)

U7- T4 V4 X4- Y-
4
⊕ BEC(0.5)
6

(0.8789) (0.9375) (0.75) (0.5)

U2 T5 -⊕ V5-⊕ X5- BEC(0.5) Y-
5
6 6

(0.1914) (0.4375) (0.25) (0.5)

U6 T7- V6 X6- Y-
6
⊕ BEC(0.5)
6

(0.3164) (0.5625) (0.75) (0.5)

U4 T6 V7- X7- Y-
7
⊕ BEC(0.5)
6

(0.0039) (0.0625) (0.25) (0.5)

U8 T8 V8 X8- Y-
8
BEC(0.5)
4.4 Example of Polar Codes for the BEC I: 4-73

• A key reason for the prevalence of polar coding after its invention is that they
form the first coding scheme that has an explicit low-complexity construction
structure while being capable of achieving channel capacity as code length
approaches infinity.
• More importantly, polar codes do not exhibit the error floor behavior, which
Turbo and (to a lesser extent) LDPC codes are prone to.
• Due to their attractive properties, polar codes were adopted in 2016 by the
3rd Generation Partnership Project (3GPP) as error correcting codes for the
control channel of the 5th generation (5G) mobile communication standard.
4.6 Lossless joint source-channel coding I: 4-74

and Shannon’s separation principle

• We next establish Shannon’s lossless joint source-channel coding theorem
(or lossless information transmission theorem), which provides explicit (and
directly veriﬁable) conditions for any communication system in terms of its
source and channel information-theoretic quantities under which the source can
be reliably transmitted (i.e., with asymptotically vanishing error probability).
• This key theorem is sometimes referred to as Shannon’s source-channel sep-
aration theorem or principle.
– Why it is named “separation principle”?
– Answer: The theorem’s necessary and suﬃcient conditions for reliable trans-
missibility are a function of entirely “separable” or “disentangled” informa-
tion quantities, i.e., the source’s minimal compression rate and the chan-
nel’s capacity with no quantities that depends on both the source and the
channel.
4.6 Lossless joint source-channel coding I: 4-75

• We will prove the theorem by assuming that the source is stationary ergodic in
the forward part and just stationary in the converse part and that the channel
is a DMC.
• Note that the theorem can be extended to more general sources and channels
with memory (see Dobrushin 1963, Vembu & Verdu & Steinberg 1995, Chen
& Alajaji 1999).

Xn Yn
Source - Source - Channel - Channel - Channel - Source - Sink
Encoder Encoder Decoder Decoder

A separate (tandem) source-channel coding scheme.

Xn Yn
Source - Encoder - Channel - Decoder - Sink

A joint source-channel coding scheme.

4.6 Lossless joint source-channel coding I: 4-76

Definition 4.29 (Source-channel block code) Given a discrete source {Vi}∞ i=1
∞
with finite alphabet V and a discrete channel {PY n|X n }n=1 with finite input and
output alphabets X and Y, respectively, an m-to-n source-channel block code ∼Cm,n
with rate mn source symbol/channel symbol is a pair of mappings (f (sc), g (sc)), where

f (sc) : V m → X n
and
g (sc) : Y n → V m.

Encoder Xn Channel Yn Decoder

V m - - - -
PY n |X n V̂ m
f (sc) g(sc)

An m-to-n block source-channel coding system.

The code’s error probability is given by

m m
Pe(∼Cm,n ) := Pr[V = V̂ ] = PV m (v m )PY n|X n (y n |f (sc)(v m))
v m ∈V m y n ∈Y n : g (sc)(y n )=v m

where PV m and PY n|X n are the source and channel distributions, respectively.
4.6 Lossless joint source-channel coding I: 4-77

Theorem 4.30 (Lossless joint source-channel coding theorem for rate-

one block codes) Consider a discrete source {Vi}∞ i=1 with ﬁnite alphabet V and
entropy rate H(V) and a DMC with input alphabet X , output alphabet Y and
capacity C, where both H(V) and C are measured in the same units (i.e., they
both use the same base of the logarithm). Then the following hold:
• Forward part (achievability): For any 0 < < 1 and given that the source is
stationary ergodic, if
H(V) < C,
then there exists a sequence of rate-one source-channel codes {∼Cm,m }∞
m=1 such
that
Pe(∼Cm,m ) < for suﬃciently large m,
where Pe(∼Cm,m ) is the error probability of the source-channel code ∼Cm,m .
4.6 Lossless joint source-channel coding I: 4-78

• Converse part: For any 0 < < 1 and given that the source is stationary, if
H(V) > C,
then any sequence of rate-one source-channel codes {∼Cm,m }∞
m=1 satisﬁes

Pe(∼Cm,m ) > (1 − )µ for suﬃciently large m, (4.6.1)

where µ = HD (V) − CD with D = |V|, and HD (V) and CD are entropy rate
and channel capacity measured in D-ary digits, i.e., the codes’ error probability
is bounded away from zero and it is not possible to transmit the source over
the channel via rate-one source-channel block codes with arbitrarily low error
probability.
4.6 Lossless joint source-channel coding I: 4-79

Proof of the forward part:

• Without loss of generality, we assume throughout this proof that both the
source entropy rate H(V) and the channel capacity C are measured in nats
(i.e., they are both expressed using the natural logarithm).
• Key idea: We will show the existence of the desired rate-one source-channel
codes ∼Cm,m via a separate (tandem or two-stage) source and channel coding
scheme.
• Let γ := C − H(V) > 0.
• Given any 0 < < 1, by the lossless source-coding theorem for stationary
ergodic sources (Theorem 3.15), there exists a sequence of source codes of
blocklength m and size Mm with
encoder fs : V m → {1, 2, . . . , Mm} and decoder gs : {1, 2, . . . , Mm} → V m
such that
1
log Mm < H(V) + γ/2 (4.6.2)
m
and
Pr [gs(fs (V m)) = V m] < /2
for m suﬃciently large.
4.6 Lossless joint source-channel coding I: 4-80

• Furthermore, by the channel coding theorem under the maximal probability of

error criterion (see Observation 4.6 and Theorem 4.11), there exists a sequence
of channel codes of blocklength m and size M̄m with encoder
fc : {1, 2, . . . , M̄m} → X m
and decoder
gc : Y m → {1, 2, . . . , M̄m}
such that
1 1
log M̄m > C − γ/2 = H(V) + γ/2 > log Mm (4.6.5)
m m
and
λ := max Pr [gc(Y m) = w|X m = fc (w)] < /2
w∈{1,...,M̄m }

for m suﬃciently large.

4.6 Lossless joint source-channel coding I: 4-81

• Now we form our source-channel code by concatenating in tandem the above

source and channel codes.
• Speciﬁcally, the m-to-m source-channel code ∼Cm,m has the following encoder-
decoder pair (f (sc) , g (sc)):
f (sc) : V m → X m with f (sc) (v m) = fc(fs (v m)) ∀v m ∈ V m
and
g (sc) : Y m → V m
with

m gs(gc(y m )), if gc(y m ) ∈ {1, 2, . . . , Mm }
g (sc)
(y ) = ∀y m ∈ Y m.
arbitrary, otherwise

• The above construction is possible since {1, 2, . . . , Mm } is a subset of {1, 2,

. . ., M̄m }.
4.6 Lossless joint source-channel coding I: 4-82

Pe(∼Cm,m ) = Pr[g (sc)(Y m) = V m ]

= Pr[g (sc)(Y m) = V m , gc(Y m ) = fs(V m)]
+ Pr[g (sc)(Y m ) = V m, gc (Y m ) = fs (V m)]
= Pr[gs(gc(Y m)) = V m , gc(Y m ) = fs(V m)]
+ Pr[g (sc)(Y m ) = V m, gc (Y m ) = fs (V m)]
≤ Pr[gs(fs(V m )) = V m ] + Pr[gc(Y m ) = fs(V m)]
= Pr[gs(fs(V m )) = V m ]

+ Pr[fs(V m ) = w] Pr[gc (Y m ) = w|fs (V m) = w]
w∈{1,2,...,Mm }
= Pr[gs(fs(V m )) = V m ]

+ Pr[X m = fc (w)] Pr[gc(Y m) = w|X m = fc(w)]
w∈{1,2,...,Mm }
≤ Pr[gs(fs(V m ))
= V m ] + λ
< /2 + /2 =
for m suﬃciently large. Thus the source can be reliably sent over the channel via
rate-one block source-channel codes as long as H(V) < C.
4.6 Lossless joint source-channel coding I: 4-83

Proof of the converse part: For simplicity, we assume in this proof that H(V)
and C are measured in bits.
For any m-to-m source-channel code ∼Cm,m , we can write
1
H(V) ≤ H(V m) (4.6.6)
m
1 1
= H(V m|V̂ m ) + I(V m; V̂ m)
m m
1 1
≤ [Pe(∼Cm,m ) log2(|V|m) + 1] + I(V m ; V̂ m) (4.6.7)
m m
1 1
≤ Pe(∼Cm,m ) log2 |V| + + I(X m; Y m ) (4.6.8)
m m
1
≤ Pe(∼Cm,m ) log2 |V| + + C (4.6.9)
m
where
• (4.6.6) is due to the fact that (1/m)H(V m) is non-increasing in m and converges
to H(V) as m → ∞ since the source is stationary (see Observation 3.12),
• (4.6.7) follows from Fano’s inequality,
H(V m|V̂ m ) ≤ Pe(∼Cm,m ) log2(|V|m)+hb (Pe(∼Cm,m)) ≤ Pe(∼Cm,m ) log2(|V|m)+1,
• (4.6.8) is due to the data processing inequality since V m → X m → Y m → V̂ m
form a Markov chain.
4.6 Lossless joint source-channel coding I: 4-84

Note that in the above derivation, the information measures are all measured in
bits. This implies that for m ≥ logD (2)/(εµ),
H(V) − C 1 log (2)
Pe(∼Cm,m ) ≥ − = HD (V) − CD − D ≥ (1 − ε)µ.
log2(|V|) m log2(|V|) ' () * ' ()m *
=µ
≤εµ
4.6 Lossless joint source-channel coding I: 4-85

Theorem 4.32 (Lossless joint source-channel coding theorem for gen-

eral rate block codes) Under the same notation as in Theorem 4.30, the fol-
lowing hold:
• Forward part (achievability): For any 0 < < 1 and given that the source
is stationary ergodic, there exists a sequence of m-to-nm source-channel codes
{∼Cm,nm }∞m=1 such that

Pe(∼Cm,nm ) < for suﬃciently large m

if
m C
lim sup < .
m→∞ nm H(V)
• Converse part: For any 0 < < 1 and given that the source is stationary, any
sequence of m-to-nm source-channel codes {∼Cm,nm }∞
m=1 with

m C
lim inf > ,
m→∞ nm H(V)
satisﬁes
Pe(∼Cm,nm ) > (1 − )µ for suﬃciently large m,
for some positive constant µ that depends on lim inf (m/nm), H(V) and C.
m→∞
4.6 Lossless joint source-channel coding I: 4-86

Discussion: separate vs joint source-channel coding

• Shannon’s separation principle has provided the linchpin for most modern com-
munication systems where source coding and channel coding schemes are sep-
arately constructed (with the source (resp., channel) code designed by only
taking into account the source (resp., channel) characteristics) and applied in
tandem without the risk of sacriﬁcing optimality in terms of reliable transmis-
sibility under unlimited coding delay and complexity.
• However, in practical implementations, there is a price to pay in delay and
complexity for extremely long coding blocklengths (particularly when delay and
complexity constraints are quite stringent such as in wireless communications
systems).
• Under ﬁnite coding blocklengths and/or complexity, many studies have demon-
strated that joint source-channel coding can provide better performance than
separate coding.
4.6 Lossless joint source-channel coding I: 4-87

• Even in the inﬁnite blocklength regime where separate coding is optimal in

terms of reliable transmissibility, it can be shown that for a large class of sys-
tems, joint source-channel coding can achieve an error exponentthat is as large
as double the error exponent resulting from separate coding. This indicates that
one can realize via joint source-channel coding the same performance as sepa-
rate coding, while reducing the coding delay by half (this result translates into
notable power savings of more than 2 dB when sending binary sources over
channels with Gaussian noise, fading an output quantization).
Key Notes I: 4-88

• Deﬁnition of reliable transmission

• Discrete memoryless channels
• Data transmission code and its rate
• Joint typical set
• Shannon’s channel coding theorem and its converse theorem
• Fano’s inequality
• Calculation of the channel capacity
– Symmetric, weakly symmetric, quasi-symmetric and T -symmetric channels
– KKT condition
• Polar coding
• Joint source-channel coding theorem

Understanding Channel Coding in TE361
No ratings yet
Understanding Channel Coding in TE361
65 pages
Channel Encoding in Wireless Networks
No ratings yet
Channel Encoding in Wireless Networks
11 pages
Communication Channels in Engineering
No ratings yet
Communication Channels in Engineering
12 pages
Data Transmission in Noisy Channels
No ratings yet
Data Transmission in Noisy Channels
182 pages
Channel Coding in Noisy Channels
No ratings yet
Channel Coding in Noisy Channels
24 pages
Channel Encoding in Digital Communication
No ratings yet
Channel Encoding in Digital Communication
5 pages
Channel Capacity in Information Theory
No ratings yet
Channel Capacity in Information Theory
12 pages
Channel Capacity in Digital Communication
100% (1)
Channel Capacity in Digital Communication
30 pages
Dcom Easy Solution PDF
No ratings yet
Dcom Easy Solution PDF
97 pages
Understanding Channel Capacity in DMC
No ratings yet
Understanding Channel Capacity in DMC
171 pages
LDPC Codes for Erasure Protection
No ratings yet
LDPC Codes for Erasure Protection
20 pages
Information Theory in CSC 310
No ratings yet
Information Theory in CSC 310
15 pages
L9,10, L11 - Module 3 Channel Models and Capacity
No ratings yet
L9,10, L11 - Module 3 Channel Models and Capacity
40 pages
Low-Density Parity Check Decoder Report
100% (1)
Low-Density Parity Check Decoder Report
30 pages
Information Theory: Channel Capacity Insights
No ratings yet
Information Theory: Channel Capacity Insights
92 pages
Discrete Memoryless Source Overview
No ratings yet
Discrete Memoryless Source Overview
37 pages
Understanding Channel Capacity in Coding
No ratings yet
Understanding Channel Capacity in Coding
17 pages
Error Control Coding Techniques Explained
No ratings yet
Error Control Coding Techniques Explained
24 pages
Channel Coding Concepts Explained
No ratings yet
Channel Coding Concepts Explained
29 pages
Error-Correcting Codes Overview
No ratings yet
Error-Correcting Codes Overview
78 pages
Discrete Channel Coding Overview
No ratings yet
Discrete Channel Coding Overview
14 pages
Channel Coding and Wireless Communication
No ratings yet
Channel Coding and Wireless Communication
32 pages
Agniel 2
No ratings yet
Agniel 2
14 pages
Discrete Memoryless Channel Overview
No ratings yet
Discrete Memoryless Channel Overview
5 pages
Information Theory and Coding Basics
No ratings yet
Information Theory and Coding Basics
37 pages
Channel Capacity: 1 Preliminaries and Definitions
No ratings yet
Channel Capacity: 1 Preliminaries and Definitions
5 pages
Error-Control Codes in Channel Coding
No ratings yet
Error-Control Codes in Channel Coding
58 pages
Source vs. Channel Coding Explained
No ratings yet
Source vs. Channel Coding Explained
107 pages
Channel Polarization and Polar Codes Analysis
No ratings yet
Channel Polarization and Polar Codes Analysis
58 pages
Introduction To Information Theory Channel Capacity and Models
No ratings yet
Introduction To Information Theory Channel Capacity and Models
36 pages
Wireless Channel Coding Techniques
No ratings yet
Wireless Channel Coding Techniques
156 pages
Information Coding Techniques
33% (3)
Information Coding Techniques
374 pages
Channel Coding Techniques Overview
No ratings yet
Channel Coding Techniques Overview
98 pages
Understanding Channel Capacity
No ratings yet
Understanding Channel Capacity
51 pages
Blind Identification of Convolutional Codes
No ratings yet
Blind Identification of Convolutional Codes
4 pages
Optimal Binary Prefix Code Design
No ratings yet
Optimal Binary Prefix Code Design
36 pages
Channel Capacity in Communication Systems
No ratings yet
Channel Capacity in Communication Systems
19 pages
Digital Transmission Error Control Methods
No ratings yet
Digital Transmission Error Control Methods
19 pages
Capacity of Discrete Memoryless Channels
No ratings yet
Capacity of Discrete Memoryless Channels
6 pages
Discrete Memoryless Channels Explained
No ratings yet
Discrete Memoryless Channels Explained
22 pages
Introduction to Communication Channels
No ratings yet
Introduction to Communication Channels
171 pages
Digital Data Transmission: Noise & Information
No ratings yet
Digital Data Transmission: Noise & Information
8 pages
Channel Coding: Hamming & Cyclic Codes
No ratings yet
Channel Coding: Hamming & Cyclic Codes
117 pages
Digital Communication System Overview
No ratings yet
Digital Communication System Overview
34 pages
Introduction to Error Control Coding
No ratings yet
Introduction to Error Control Coding
31 pages
Hard vs Soft Decision Decoding Techniques
No ratings yet
Hard vs Soft Decision Decoding Techniques
19 pages
Continuous Channel Coding Explained
No ratings yet
Continuous Channel Coding Explained
37 pages
Channel Capacity and Mutual Information
No ratings yet
Channel Capacity and Mutual Information
41 pages
Channel Capacity of Binary Symmetric Channel
No ratings yet
Channel Capacity of Binary Symmetric Channel
30 pages
Understanding Information Channels
No ratings yet
Understanding Information Channels
54 pages
Finite Blocklength Coding with CSI
No ratings yet
Finite Blocklength Coding with CSI
5 pages
Discrete Memory Less Channel
No ratings yet
Discrete Memory Less Channel
68 pages
CH 7
No ratings yet
CH 7
68 pages
Introduction to Error Control Coding
No ratings yet
Introduction to Error Control Coding
453 pages
Channel Coding in Communications Systems
No ratings yet
Channel Coding in Communications Systems
22 pages
Introduction to LDPC Codes
No ratings yet
Introduction to LDPC Codes
34 pages
Image Creation in Communication Systems
No ratings yet
Image Creation in Communication Systems
21 pages
Madzivanyika High School ZJC English Exam
No ratings yet
Madzivanyika High School ZJC English Exam
4 pages
PHP MySQL Login Lab Instructions
No ratings yet
PHP MySQL Login Lab Instructions
6 pages
Belilios Public School English Exam 2015
No ratings yet
Belilios Public School English Exam 2015
9 pages
Academic References on Islamic Studies
No ratings yet
Academic References on Islamic Studies
1 page
Essential MS Word Project Shortcuts
No ratings yet
Essential MS Word Project Shortcuts
7 pages
SOE 508-AI - in - SwEngr
No ratings yet
SOE 508-AI - in - SwEngr
17 pages
Functional Dependencies and Keys Analysis
No ratings yet
Functional Dependencies and Keys Analysis
18 pages
Query-Based RAG Innovations in Healthcare
No ratings yet
Query-Based RAG Innovations in Healthcare
22 pages
Sort
No ratings yet
Sort
204 pages
Daily Language Review for Grade 1
No ratings yet
Daily Language Review for Grade 1
2 pages
Pathfinder Day Thanksgiving Worship Program
No ratings yet
Pathfinder Day Thanksgiving Worship Program
2 pages
Understanding Cave Formation
No ratings yet
Understanding Cave Formation
104 pages
Advent Preparation for Religious Education
No ratings yet
Advent Preparation for Religious Education
11 pages
Translation Evolution in the Philippines
No ratings yet
Translation Evolution in the Philippines
5 pages
RPS Report Preparation Guidelines
No ratings yet
RPS Report Preparation Guidelines
24 pages
Digital Humanities Course Overview
100% (1)
Digital Humanities Course Overview
7 pages
Analyzing "The Wife of Bath"
No ratings yet
Analyzing "The Wife of Bath"
80 pages
Combinational Circuits Overview
No ratings yet
Combinational Circuits Overview
42 pages
Macbeth: A Tragic Scottish Tale
No ratings yet
Macbeth: A Tragic Scottish Tale
6 pages
E Eq VX1RG F
No ratings yet
E Eq VX1RG F
208 pages
Japanese Language Trainer Resume
No ratings yet
Japanese Language Trainer Resume
1 page
Python Data Types Operations Guide
No ratings yet
Python Data Types Operations Guide
8 pages
Derrida's Deconstruction of Structure
No ratings yet
Derrida's Deconstruction of Structure
6 pages
Dan Castellaneta: Voice of Homer Simpson
No ratings yet
Dan Castellaneta: Voice of Homer Simpson
2 pages
Email and English Study Insights
No ratings yet
Email and English Study Insights
4 pages
Grade 8 Isizulu Exam Paper 2019
No ratings yet
Grade 8 Isizulu Exam Paper 2019
12 pages
Describing People: Reading Comprehension
No ratings yet
Describing People: Reading Comprehension
6 pages
Furniture PDF
No ratings yet
Furniture PDF
4 pages
GSM Interfacing with LPC2148 Code
No ratings yet
GSM Interfacing with LPC2148 Code
6 pages
Future Continuous Tense Explained
No ratings yet
Future Continuous Tense Explained
4 pages

Data Transmission and Channel Capacity

Uploaded by

Data Transmission and Channel Capacity

Uploaded by

Chapter 4

Data Transmission and Channel Capacity

Po-Ning Chen, Professor

Institute of Communications Engineering

National Chiao Tung University

Hsin Chu, Taiwan 30010, R.O.C.

• Deﬁnition of “reliable” transmission

W - Channel Xn - Channel Yn - Channel Ŵ -

• A data transmission system, where

Deﬁnition 4.1 (Discrete channel) A discrete communication channel is char-

for every xn ∈ X n , where xn = (x1, · · · , xn ) ∈ X n and y n = (y1, · · · , yn ) ∈

for every xi, y i , PXi+1|X i and i = 1, 2, · · · .

Deﬁnition 4.2 (Discrete memoryless channel) A discrete memoryless chan-

for every n = 1, 2, · · · , xn ∈ X n and y n ∈ Y n. In other words, a DMC is

1. Identity (noiseless) channels: An identity channel has equal-size input and

2. Binary symmetric channels (BSC):

• ε = 0 reduces the BSC to the binary identity (noiseless) channel.

• BSC can be explicitly represented via a binary modulo-2 additive noise

3. Binary erasure channels (BEC):

4. Binary symmetric erasure channel (BSEC):

5. q-ary symmetric channels:

• Similar to the BSC, the q-ary symmetric channel can be expressed as a

6. q-ary erasure channels:

where 0 ≤ α ≤ 1 is the erasure probability.

W - Channel Xn - Channel Yn - Channel Ŵ -

Deﬁnition 4.4 (Fixed-length data transmission code) Given positive in-

Deﬁnition 4.5 (Average probability of error) The average probability of

Theorem 4.9 (Shannon-McMillan-Breiman theorem for pairs) Given

Proof: The proof is quite similar to that of the Shannon-McMillan-Breiman the-

Deﬁnition 4.10 (Operational capacity) A rate R is said to be achievable

Theorem 4.11 (Shannon’s channel coding theorem) Consider a DMC

Such an expression is more suitable for calculating the channel capacity.

Idea behind the proof of the forward part:

Proof of the forward part:

• Let PX̂ be the probability distribution achieving the channel capacity:

We next present the proof in three steps.

Step 1: Code construction.

• Again, let me repeat the encoding and decoding process here!

Step 2: Conditional error probability.

• By taking expectation in (4.3.3) with respect to the mth codeword-selecting

Step 3: Average error probability.

= PX̂ n,Ŷ n (Fnc (δ))

= (Mn − 1)2n(H(X̂,Ŷ )+δ)2−n(H(X̂)−δ) 2−n(H(Ŷ )−δ)

Relation between Fano’s inequality and converse proof:

Proof of the converse part:

• We can also upper bound I(X n; Y n ) in terms of the channel capacity C as

• Consequently, code ∼Cn satisﬁes the following:

When R > C, Pe(Cn) > (1 − ε)µ is bounded

lim supn→∞ Pe(Cn) = 0 lim supn→∞ Pe(Cn) > 0

– Note that Theorem 4.11 actually indicates

– Such a “two-region” behavior however only holds for a DMC.

where C̄ is named the optimistic channel capacity.

4.5.1 Symmetric, Weakly Symmetric, and Quasi-symmetric Channels

Example of symmetric channel: A ternary DMC channel with X = Y =

Example of weakly symmetric but non-symmetric channel: A quadra-

is weakly-symmetric (but not symmetric).

Lemma 4.16 The capacity of a weakly-symmetric channel Q is achieved by a

is the row entropy.

Example 4.19 (Capacity of the q-ary symmetric channel) Similarly, the

Deﬁnition 4.20 (Quasi-symmetric channels) A DMC with ﬁnite input al-

• The notion of “quasi-symmetry” we provide here is slightly more general than

Lemma 4.21 The capacity of a quasi-symmetric channel Q is achieved by a uni-

is the capacity of the ith weakly-symmetric “sub-channel” whose transition matrix

We thus obtain that

Deﬁnition 4.24 (Mutual information for a speciﬁc input symbol) The

From the above deﬁnition, the mutual information becomes:

• The problem of calculating the channel capacity is to maximize

subject to the condition

The details for taking the derivative are as follows:

• By Property 2 of Lemma 2.46, I(X; Y ) = I(PX , PY |X ) is a concave function

• To summarize, if an input distribution PX achieves the channel capacity, then

Question: Does the uniform input achieve the channel capacity iﬀ

Observation 4.28 (Capacity achieved by a uniform input distribu-

is a constant function of x (i.e., functionally independent of x), where I(x; Y )

Its capacity is achieved by the uniform input distribution.

• Unlike quasi-symmetric channels, T -symmetric channels do not admit in gen-

Pe(∼Cm,m ) > (1 − )µ for suﬃciently large m, (4.6.1)

Pe(∼Cm,nm ) < for suﬃciently large m