Data Transmission and Channel Capacity
Data Transmission and Channel Capacity
• Data transmission
– To carefully select codewords from the set of channel input words (of a given
length) so that a minimal ambiguity is obtained at the channel receiver.
• E.g., to transmit binary message through the following channel.
00 1 -* 0
1
01
10 1 -
* 1
1
11
Code of (00 for event A, 10 for event B) obviously induces less ambiguity at
the receiver than the code of (00 for event A, 01 for event B).
Reliable Transmission I: 4-2
• What is the maximum amount of information (per channel input) that can be
reliably transmitted via a given noisy channel?
– E.g. We can transmit 1 bit per channel usage by the following code.
00 1 -
* 0
1
01
10 1 -
* 1
1
11
Code = (00 for event A, 10 for event B)
Discrete memoryless channels I: 4-5
such that
PY n|X n (y n |xn) = 1
y n ∈Y n
X PY |X Y
1−ε -
0 * 0
-
j
1 1
1−ε
X PY |X Y
1−α -
0 0
α
z
: E
α
-
1 1
1−α
X PY |X Y
1−ε−α -
0 * 0
ε α
z
: E
ε α
-
j
1 1
1−ε−α
• One can combine the BSC with the BEC to obtain a binary channel with
both errors and erasures.
• The channel’s transition matrix is given by
p0,0 p0,E p0,1 1−ε−α α ε
Q = [px,y ] = = (4.2.8)
p1,0 p1,E p1,1 ε α 1−ε−α
where ε, α ∈ [0, 1] are the channel’s crossover and erasure probabilities,
respectively.
• Clearly, setting α = 0 reduces the BSEC to the BSC, and setting ε = 0
reduces the BSEC to the BEC.
Frequently used channels I: 4-12
• More generally, the channel needs not have a symmetric property in the
sense of having identical transition distributions when inputs bits 0 or 1 are
sent. For example, the channel’s transition matrix can be given by
p0,0 p0,E p0,1 1−ε−α α ε
Q = [px,y ] = = (4.2.10)
p1,0 p1,E p1,1 ε α 1 − ε − α
where the probabilities ε = ε and α = α in general. We call such channel,
an asymmetric channel with errors and erasures.
Frequently used channels I: 4-13
where
λw (∼Cn) := Pr[Ŵ = W |W = w] = Pr[g(Y n) = w|X n = f (w)]
= PY n|X n (y n |f (w))
y n ∈Y n : g(y n)=w
is the code’s conditional probability of decoding error given that message w is sent
over the channel.
4.3 Block codes for data transmission over DMCs I: 4-18
Observation 4.6 Another more conservative error criterion is the so-called max-
imal probability of error
λ(∼Cn ):= max λw (∼Cn).
w∈{1,2,··· ,M}
Clearly,
Pe(∼Cn = (n, M )) ≤ λ(∼Cn = (n, M ));
However,
2 × Pe(∼Cn = (n, M )) ≥ λ(∼Cn = (n, M/2)),
where ∼Cn is constructed by throwing away from ∼Cn half of its codewords with
largest conditional probability of error λw (∼Cn ).
So
1
λ(∼Cn ) ≤ Pe(∼Cn) ≤ λ(∼Cn )
2
with code rates
1 1 1
R = log2(M ) and R = log2(M/2) = R − .
n n n
Consequently, a reliable transmission rate R under the average probability of error
criterion is also a reliable transmission rate under the maximal probability of error
criterion.
4.3 Block codes for data transmission over DMCs I: 4-19
Definition 4.7 (Jointly typical set) The set Fn (δ) of jointly δ-typical n-tuple
pairs (xn, y n ) with respect to the memoryless distribution
n
PX n,Y n (xn, y n ) = PX,Y (xi, yi )
i=1
is defined by
Fn (δ) := (xn, y n ) ∈ X n × Y n :
1
− log2 PX n (xn) − H(X) < δ,
n
1
− log2 PY n (y n ) − H(Y ) < δ,
n
1
and − log2 PX n,Y n (xn, y n ) − H(X, Y ) < δ .
n
In short, a pair (xn, y n ) generated by independently drawing n times under PX,Y is
jointly δ-typical if its joint and marginal empirical entropies are respectively δ-close
to the true joint and marginal entropies.
4.3 Block codes for data transmission over DMCs I: 4-20
Theorem 4.8 (Joint AEP) If (X1, Y1), (X2, Y2), . . ., (Xn, Yn), . . . are i.i.d.,
i.e., {(Xi, Yi)}∞
i=1 is a dependent pair of DMSs, then
1
− log2 PX n (X1, X2, . . . , Xn) → H(X) in probability,
n
1
− log2 PY n (Y1, Y2, . . . , Yn) → H(Y ) in probability,
n
and
1
− log2 PX n,Y n ((X1, Y1), . . . , (Xn, Yn)) → H(X, Y ) in probability
n
as n → ∞.
Proof: By the weak law of large numbers, we have the desired result. 2
4.3 Block codes for data transmission over DMCs I: 4-21
The channel’s operational capacity, Cop, is the supremum of all achievable rates:
Cop = sup{R : R is achievable}.
• The next theorem shows Cop = C, i.e., the information capacity is equal
to the operational capacity.
4.3 Block codes for data transmission over DMCs I: 4-23
where the maximum is taken over all input distributions PX . Then the following
hold.
• Forward part (achievability): For any 0 < ε < 1, there exist γ > 0 and a
sequence of data transmission block codes {∼Cn = (n, Mn)}∞
n=1 with
1
C> lim inf log Mn ≥ C − γ
n→∞ n 2
and
Pe(∼Cn) < ε for sufficiently large n,
where Pe(∼Cn) denotes the (average) probability of error for block code ∼Cn .
4.3 Block codes for data transmission over DMCs I: 4-24
• Converse part: For any 0 < ε < 1, any sequence of data transmission block
codes {∼Cn = (n, Mn)}∞
n=1 with
1
lim inf log Mn > C
n→∞ n 2
satisfies
Pe(∼Cn) > (1 − )µ for sufficiently large n, (4.3.1)
where
C
µ=1− > 0,
lim inf n→∞ n1 log2 Mn
i.e., the codes’ probability of error is bounded away from zero for all n suffi-
ciently large.
Notes:
• (4.3.1) actually implies that
lim inf Pe(∼Cn) ≥ lim(1 − )µ = µ,
n→∞ ↓0
where the error probability lower bound is nothing to do with . Here we state
the converse of Theorem 4.11 in a form in parallel to the converse statements
in Theorems 3.6 and 3.15.
4.3 Block codes for data transmission over DMCs I: 4-25
• Also note that the mutual information I(X; Y ) is actually a function of the
input statistics PX and the channel statistics PY |X . Hence, we may write it as
PY |X (y|x)
I(PX , PY |X ) = PX (x)PY |X (y|x) log2 .
P
x ∈X X
(x )P
Y |X (y|x )
x∈X y∈Y
– Hence, there must exist at least one such a desired good code sequence
{∼Cn∗}∞
n=1 among them (with Pe(∼ Cn∗) → 0 as n → ∞).
4.3 Block codes for data transmission over DMCs I: 4-27
Denote by PŶ n the channel output distribution due to channel input product
distribution PX̂ n with PX̂ n (xn) = ni=1 PX̂ (xi); in other words,
n
PŶ n (y ) = PX̂ n,Ŷ n (xn, y n )
xn ∈X n
and
PX̂ n,Ŷ n (xn, y n ):=PX̂ n (xn)PY n|X n (y n |xn)
for all xn ∈ X n and y n ∈ Y n.
– Note that since PX̂ n (xn) = ni=1 PX̂ (xi ) and the channel is memoryless,
the resulting joint input-output process {(X̂i, Ŷi)}∞ i=1 is also memoryless
with n
n n
PX̂ n,Ŷ n (x , y ) = PX̂,Ŷ (xi, yi )
i=1
and
PX̂,Ŷ (x, y) = PX̂ (x)PY |X (y|x) for x ∈ X , y ∈ Y.
where Fn(δ) is defined in Definition 4.7 with respect to distribution PX̂ n,Ŷ n .
(We assume that the codebook ∼Cn and the channel distribution PY |X are
known at both the encoder and the decoder.)
4.3 Block codes for data transmission over DMCs I: 4-30
Fn (δ) := (xn, y n ) ∈ X n × Y n :
1 1
− log2 PX n (xn) − H(X) < δ, − log2 PY n (y n ) − H(Y ) < δ,
n n
1
and − log2 PX n,Y n (xn, y n ) − H(X, Y ) < δ .
n
where
– the first term in (4.3.3) considers the case that the received channel
output y n is not jointly δ-typical with cm , (and hence, the decoding rule
gn (·) would possibly result in a wrong guess), and
– the second term in (4.3.3) reflects the situation when y n is jointly δ-
typical not only with the transmitted codeword cm, but also with an-
other codeword cm (which may cause a decoding error).
4.3 Block codes for data transmission over DMCs I: 4-32
E∼Cn [Pe(∼Cn )] = Pr[∼Cn]Pe(∼Cn)
∼Cn
Mn
1
= ··· PX̂ n (c1) · · · PX̂ n (cMn ) λm(∼Cn)
Mn m=1
c1 ∈X n cMn ∈X n
Mn
1
= ··· ···
Mn m=1 n c1 ∈X cm−1 ∈X n cm+1 ∈X n cMn ∈X n
PX̂ n (c1) · · · PX̂ n (cm−1 )PX̂ n (cm+1) · · · PX̂ n (cMn )
× PX̂ n (cm)λm(∼Cn)
cm ∈X n
4.3 Block codes for data transmission over DMCs I: 4-34
Mn
1
≤ ··· ···
Mn m=1 n c1 ∈X cm−1 ∈X n cm+1 ∈X n cMn ∈X n
PX̂ n (c1) · · · PX̂ n (cm−1 )PX̂ n (cm+1) · · · PX̂ n (cMn )
×PX̂ n,Ŷ n (Fnc (δ))
Mn
1
+ ··· ···
Mn m=1 n c1 ∈X cm−1 ∈X n cm+1 ∈X n cMn ∈X n
PX̂ n (c1) · · · PX̂ n (cm−1 )PX̂ n (cm+1) · · · PX̂ n (cMn )
Mn
× PX̂ n,Ŷ n (cm , y n ) (4.3.5)
m =1 cm ∈X n y n ∈Fn (δ|cm )
m =m
4.3 Block codes for data transmission over DMCs I: 4-35
where (4.3.5) follows from (4.3.4), and the last step holds since PX̂ n,Ŷ n (Fnc (δ))
is a constant independent of c1, . . ., cMn and m.
4.3 Block codes for data transmission over DMCs I: 4-36
Mn
(Then for n > N0 ) ··· ···
m =1 c1 ∈X n cm−1 ∈X n cm+1 ∈X n cMn ∈X n
m =m
PX̂ n (c1) · · · PX̂ n (cm−1)PX̂ n (cm+1) · · · PX̂ n (cMn )
× P n n (cm , y n )
X̂ ,Ŷ
cm ∈X n y n ∈F n (δ|cm )
Mn
= PX̂ n (cm )PX̂ n,Ŷ n (cm , y n )
m =1 cm ∈X n cm ∈X n y n ∈Fn (δ|cm )
m =m
Mn
= PX̂ n (cm ) PX̂ n,Ŷ n (cm, y n )
m =1 cm ∈X n y n ∈Fn (δ|cm ) cm ∈X n
m =m
Mn
= PX̂ n (cm )PŶ n (y n )
m =1 cm ∈X n y n ∈Fn (δ|cm )
m =m
4.3 Block codes for data transmission over DMCs I: 4-37
Mn
= PX̂ n (cm )PŶ n (y n )
m =1 (cm ,y n )∈Fn (δ)
m =m
Mn
≤ |Fn(δ)|2−n(H(X̂)−δ)2−n(H(Ŷ )−δ)
m =1
m =m
Mn
≤ 2n(H(X̂,Ŷ )+δ)2−n(H(X̂)−δ)2−n(H(Ŷ )−δ)
m =1
m =m
Consequently,
E∼Cn [Pe(∼Cn )] ≤ PX̂ n,Ŷ n (Fnc (δ)) + 2−nδ ,
which for sufficiently large n (and n > N0), can be made smaller than 2δ =
γ/4 < ε by the Shannon-McMillan-Breiman theorem for pairs. 2
Fano’s inequality I: 4-39
• Hence, for n ≥ N ,
C + 1/n
Pe(∼Cn) > 1 − [1 − (1 − ε)µ] = (1 − )µ > 0;
C + 1/n
i.e., Pe(∼Cn ) is bounded away from zero for n sufficiently large. 2
• Converse part: For any 0 < ε < 1, any sequence of data transmission
block codes {∼Cn = (n, Mn)}∞
n=1 with
1
R = lim inf log Mn > C
n→∞ n 2
satisfies
Pe(∼Cn ) > (1 − )µ for sufficiently large n,
where
C C
µ=1− = 1− > 0,
lim inf n→∞ n1 log2 Mn R
i.e., the codes’ probability of error is bounded away from zero for all n
sufficiently large.
4.3 Block codes for data transmission over DMCs I: 4-43
C
µ=1−
R
0
0 C
4.3 Block codes for data transmission over DMCs I: 4-44
Observation 4.12
• The results of the above channel coding theorem is illustrated in the figure
below, where
R = lim inf Rn = lim inf (1/n) log2 Mn message bits/channel use
n→∞ n→∞
is usually called the asymptotic coding rate of channel block codes, and Rn is
the code rate for codes of blocklength n.
– For a more general channel, three partitions instead of two may result, i.e.,
(i) R < C, (ii) C < R < C̄ and (iii) R > C̄,
which respectively correspond to
(i) lim supn→∞ Pe(Cn) = 0 for the best block code,
(ii) lim supn→∞ Pe(Cn) > 0 but lim inf n→∞ Pe = 0 for the best block code, and
(iii) lim inf n→∞ Pe(Cn) > 0 for all channel code codes,
Definition 4.15
• A DMC with finite input alphabet X , finite output alphabet Y and channel
transition matrix Q = [px,y ] of size |X |×|Y| is said to be symmetric if the rows
of Q are permutations of each other and the columns of Q are permutations of
each other.
• The channel is said to be weakly-symmetric if the rows of Q are permutations
of each other and all the column sums in Q are equal.
Proof:
• The mutual information between the channel’s input and output is given by
I(X; Y ) = H(Y ) − H(Y |X)
= H(Y ) − PX (x)H(Y |X = x)
x∈X
where
H(Y |X = x) = − PY |X (y|x) log2 PY |X (y|x) = − px,y log2 px,y .
y∈Y y∈Y
• Noting that every row of Q is a permutation of every other row, we obtain that
H(Y |X = x) is independent of x and can be written as
H(Y |X = x) = H(q1 , q2, · · · , q|Y|)
where (q1, q2, · · · , q|Y| ) is any row of Q.
4.5 Calculating channel capacity I: 4-49
• Thus
H(Y |X) = PX (x)H(q1, q2, · · · , q|Y| )
x∈X
= H(q1 , q2, · · · , q|Y| ) PX (x)
x∈X
= H(q1 , q2, · · · , q|Y| ).
This implies
I(X; Y ) = H(Y ) − H(q1, q2, · · · , q|Y| )
≤ log2 |Y| − H(q1 , q2, · · · , q|Y| )
with equality achieved iff Y is uniformly distributed over Y.
• The proof is completed by confirming that for a weakly symmetric channel, the
uniform input distribution induces the uniform output distribution (see the
text). 2
4.5 Calculating channel capacity I: 4-50
Example 4.18 (Capacity of the BSC) Since the BSC with crossover proba-
bility (or bit error rate) ε is symmetric, we directly obtain from Lemma 4.16 that
its capacity is achieved by a uniform input distribution and is given by
C = log2(2) − H(1 − ε, ε) = 1 − hb(ε) (4.5.5)
where hb(·) is the binary entropy function.
Quasi- = “having some, but not all of the features of” such as quasi-scholar and
quasi-official.
where
ai := px,y = sum of any row in Qi, i = 1, · · · , m,
y∈Yi
and
" #
Ci = log2 |Yi| − H any row in the matrix 1
Q
ai i
, i = 1, · · · , m
Example 4.22 (Capacity of the BEC) The BEC with erasure probability α
and transition matrix
PY |X (0|0) PY |X (E|0) PY |X (1|0) 1−α α 0
Q = =
PY |X (0|1) PY |X (E|1) PY |X (1|1) 0 α 1−α
is quasi-symmetric (but neither weakly-symmetric nor symmetric).
• Its transition matrix Q can be partitioned along its columns into two symmetric
(hence weakly-symmetric) sub-matrices
1−α 0
Q1 =
0 1−α
and
α
Q2 = .
α
4.5 Calculating channel capacity I: 4-54
• Thus applying the capacity formula for quasi-symmetric channels of Lemma 4.21
yields that the capacity of the BEC is given by
C = a1 C 1 + a2 C 2
where a1 = 1 − α, a2 = α,
!
1−α 0
C1 = log2(2) − H , = 1 − H(1, 0) = 1 − 0 = 1,
1−α 1−α
and "α#
C2 = log2(1) − H = 0 − 0 = 0.
α
Therefore, the BEC capacity is given by
C = (1 − α)(1) + (α)(0) = 1 − α. (4.5.7)
4.5 Calculating channel capacity I: 4-55
Example 4.23 (Capacity of the BSEC) Similarly, the BSEC with crossover
probability ε and erasure probability α and transition matrix
p0,0 p0,E p0,1 1−ε−α α ε
Q = [px,y ] = =
p1,0 p1,E p1,1 ε α 1−ε−α
is quasi-symmetric; its transition matrix can be partitioned along its columns into
two symmetric sub-matrices
1−ε−α ε
Q1 =
ε 1−ε−α
and
α
Q2 = .
α
Hence by Lemma 4.21, the channel capacity is given by C = a1C1 + a2C2 where
a1 = 1 − α, a2 = α,
! !
1−ε−α ε 1−ε−α
C1 = log2(2) − H , = 1 − hb ,
1−α 1−α 1−α
and "α#
C2 = log2(1) − H = 0.
α
4.5 Calculating channel capacity I: 4-56
Lemma 4.25 (KKT condition for channel capacity) For a given DMC,
an input distribution PX achieves its channel capacity iff there exists a constant C
such that
I(x : Y ) = C ∀x ∈ X with PX (x) > 0;
(4.5.9)
I(x : Y ) ≤ C ∀x ∈ X with PX (x) = 0.
Furthermore, the constant C is the channel capacity (justifying the choice of nota-
tion).
Proof: The forward (if) part holds directly; hence, we only prove the converse
(only-if) part.
• Without loss of generality, we assume that PX (x) < 1 for all x ∈ X , since
PX (x) = 1 for some x implies that I(X; Y ) = 0.
4.5.2 Karuch-Kuhn-Tucker cond. for chan. capacity I: 4-59
• We then take the derivative of the above quantity with respect to PX (x ), and
obtain that
∂f (PX )
= I(x ; Y ) − log2(e) + λ.
∂PX (x )
m
C= ai C i (4.5.6)
i=1
4.4 Example of Polar Codes for the BEC I: 4-64
• Polar coding is a new channel coding method proposed by Arikan during 2008-
2009, which can provably achieve the capacity of any binary-input memoryless
channel Q whose capacity is realized by a uniform input distribution.
• The main idea behind polar codes is channel “polarization,” which transforms
n uses of BEC(ε) into extremal “polarized” channels; i.e., channels which are
either perfect (noiseless) or completely noisy.
• It is shown that as n → ∞, the number of unpolarized channels converges to
0 and the fraction of perfect channels converges to I(X; Y ) = 1 − ε under a
uniform input, which is the capacity of the BEC (see Example 4.22 in Section
4.5).
• A polar code can then be naturally obtained by sending information bits di-
rectly through those perfect channels and sending known bits (usually called
frozen bits) through the completely noisy channels.
4.4 Example of Polar Codes for the BEC I: 4-65
U1 -⊕ X1 - BEC(ε) Y1 -
6
U2 X2 - BEC(ε) Y2 -
U1 - ⊕ X1 - BEC(ε) Y1 -
6
U2 X2 - BEC(ε) Y2 -
U1 - ⊕ U1 ⊕ U2 - BEC(ε) Y1 -
6
U2 U2 - BEC(ε) Y2 -
Y1 ⊕ Y2 , if Y1, Y2 ∈ {0, 1}
? ⊕ Y , if Y1 = E, Y2 ∈ {0, 1}
2
• Q− : U1 =
Y1 ⊕ ? , if Y1 ∈ {0, 1}, Y2 = E
? ⊕ ?, if Y1 = Y2 = E
Noting that given output E for a BEC, the receiver knows “nothing” about
the input.
• Thus, Q− is a BEC with erasure probability ε− := 1 − (1 − ε)2.
4.4 Example of Polar Codes for the BEC I: 4-68
U1 - ⊕ U1 ⊕ U2 - BEC(ε) Y1 -
6
U2 U2 - BEC(ε) Y2 -
Y1 ⊕ U1, if Y1 ∈ {0, 1}
• Q+: U2 = Y2, if Y2 ∈ {0, 1}
?, if Y1 = Y2 = E
• Now, let us consider the case of n = 4 and suppose we perform the basic trans-
formation twice to send (i.i.d. uniform) message bits (U1, U2 , U3, U4), yielding
Example 4.14 Consider a BEC with erasure probability ε = 0.5 and let n = 8.
4.4 Example of Polar Codes for the BEC I: 4-72
• A key reason for the prevalence of polar coding after its invention is that they
form the first coding scheme that has an explicit low-complexity construction
structure while being capable of achieving channel capacity as code length
approaches infinity.
• More importantly, polar codes do not exhibit the error floor behavior, which
Turbo and (to a lesser extent) LDPC codes are prone to.
• Due to their attractive properties, polar codes were adopted in 2016 by the
3rd Generation Partnership Project (3GPP) as error correcting codes for the
control channel of the 5th generation (5G) mobile communication standard.
4.6 Lossless joint source-channel coding I: 4-74
• We will prove the theorem by assuming that the source is stationary ergodic in
the forward part and just stationary in the converse part and that the channel
is a DMC.
• Note that the theorem can be extended to more general sources and channels
with memory (see Dobrushin 1963, Vembu & Verdu & Steinberg 1995, Chen
& Alajaji 1999).
Xn Yn
Source - Source - Channel - Channel - Channel - Source - Sink
Encoder Encoder Decoder Decoder
Xn Yn
Source - Encoder - Channel - Decoder - Sink
Definition 4.29 (Source-channel block code) Given a discrete source {Vi}∞ i=1
∞
with finite alphabet V and a discrete channel {PY n|X n }n=1 with finite input and
output alphabets X and Y, respectively, an m-to-n source-channel block code ∼Cm,n
with rate mn source symbol/channel symbol is a pair of mappings (f (sc), g (sc)), where
f (sc) : V m → X n
and
g (sc) : Y n → V m.
where PV m and PY n|X n are the source and channel distributions, respectively.
4.6 Lossless joint source-channel coding I: 4-77
• Converse part: For any 0 < < 1 and given that the source is stationary, if
H(V) > C,
then any sequence of rate-one source-channel codes {∼Cm,m }∞
m=1 satisfies
Proof of the converse part: For simplicity, we assume in this proof that H(V)
and C are measured in bits.
For any m-to-m source-channel code ∼Cm,m , we can write
1
H(V) ≤ H(V m) (4.6.6)
m
1 1
= H(V m|V̂ m ) + I(V m; V̂ m)
m m
1 1
≤ [Pe(∼Cm,m ) log2(|V|m) + 1] + I(V m ; V̂ m) (4.6.7)
m m
1 1
≤ Pe(∼Cm,m ) log2 |V| + + I(X m; Y m ) (4.6.8)
m m
1
≤ Pe(∼Cm,m ) log2 |V| + + C (4.6.9)
m
where
• (4.6.6) is due to the fact that (1/m)H(V m) is non-increasing in m and converges
to H(V) as m → ∞ since the source is stationary (see Observation 3.12),
• (4.6.7) follows from Fano’s inequality,
H(V m|V̂ m ) ≤ Pe(∼Cm,m ) log2(|V|m)+hb (Pe(∼Cm,m)) ≤ Pe(∼Cm,m ) log2(|V|m)+1,
• (4.6.8) is due to the data processing inequality since V m → X m → Y m → V̂ m
form a Markov chain.
4.6 Lossless joint source-channel coding I: 4-84
Note that in the above derivation, the information measures are all measured in
bits. This implies that for m ≥ logD (2)/(εµ),
H(V) − C 1 log (2)
Pe(∼Cm,m ) ≥ − = HD (V) − CD − D ≥ (1 − ε)µ.
log2(|V|) m log2(|V|) ' () * ' ()m *
=µ
≤εµ
4.6 Lossless joint source-channel coding I: 4-85
m C
lim inf > ,
m→∞ nm H(V)
satisfies
Pe(∼Cm,nm ) > (1 − )µ for sufficiently large m,
for some positive constant µ that depends on lim inf (m/nm), H(V) and C.
m→∞
4.6 Lossless joint source-channel coding I: 4-86
• Shannon’s separation principle has provided the linchpin for most modern com-
munication systems where source coding and channel coding schemes are sep-
arately constructed (with the source (resp., channel) code designed by only
taking into account the source (resp., channel) characteristics) and applied in
tandem without the risk of sacrificing optimality in terms of reliable transmis-
sibility under unlimited coding delay and complexity.
• However, in practical implementations, there is a price to pay in delay and
complexity for extremely long coding blocklengths (particularly when delay and
complexity constraints are quite stringent such as in wireless communications
systems).
• Under finite coding blocklengths and/or complexity, many studies have demon-
strated that joint source-channel coding can provide better performance than
separate coding.
4.6 Lossless joint source-channel coding I: 4-87