0% found this document useful (0 votes)
6 views6 pages

Data Compression

The document provides a comprehensive study material on data compression techniques for M.Tech Digital Communications students, covering topics such as entropy coding, Huffman coding, and various predictive and transform coding methods. It includes theoretical concepts, formulas, and practical applications relevant to digital broadcasting standards. The material is structured for quick revision and preparation for university examinations.

Uploaded by

afreenshaik3071
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views6 pages

Data Compression

The document provides a comprehensive study material on data compression techniques for M.Tech Digital Communications students, covering topics such as entropy coding, Huffman coding, and various predictive and transform coding methods. It includes theoretical concepts, formulas, and practical applications relevant to digital broadcasting standards. The material is structured for quick revision and preparation for university examinations.

Uploaded by

afreenshaik3071
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

M.

Tech Digital Communications


MTDC 15

DATA COMPRESSION TECHNIQUES

Complete Study Material for Semester Examination


Beginner-Friendly | Theory + Formulas + Diagrams | Quick Revision

University Exam 100 Marks

Sessional 50 Marks

Duration 3 Hours

Instruction 3 Periods per week


UNIT I: Entropy Coding & Source Models

1.1 Why Data Compression?


Data compression removes REDUNDANCY from data to reduce its size. Example: The text
"AAAAAABBB" has redundancy — we can represent it as "6A3B" = much shorter! Two types: • Lossless
compression: No data lost; perfectly reconstruct original (ZIP, PNG, FLAC) • Lossy compression: Some
data lost; can't perfectly reconstruct (JPEG, MP3, H.264)

1.2 Information Theory Basics


Information content of symbol s: I(s) = log2(1/p(s)) bits The RARER a symbol, the MORE
information it carries! Entropy H(X): Average information per symbol: H(X) = -Σ p(x) · log2(p(x))
bits/symbol Entropy = MINIMUM average code length achievable (Shannon's source coding theorem)
Example: Fair coin: H = -0.5·log2(0.5) - 0.5·log2(0.5) = 1 bit Biased coin (p=0.9 heads): H =
-0.9·log2(0.9) - 0.1·log2(0.1) = 0.469 bits

1.3 Huffman Coding


Huffman coding assigns SHORT codes to FREQUENT symbols and LONG codes to RARE symbols.
Construction algorithm: 1. Sort symbols by probability (low to high) 2. Combine two LOWEST
probability symbols into a node (sum their probabilities) 3. Repeat until only one node left (the root) 4.
Assign 0/1 to left/right branches 5. Code for each symbol = path from root to leaf Example: p(A)=0.5,
p(B)=0.25, p(C)=0.125, p(D)=0.125 A→0, B→10, C→110, D→111 Average length = 0.5×1 + 0.25×2 +
0.125×3 + 0.125×3 = 1.75 bits (Entropy = 1.75 bits ✓)

1.4 Arithmetic Coding


Arithmetic coding encodes an ENTIRE message as a single number in [0,1). Process: 1. Start with interval
[0, 1) 2. For each symbol, subdivide current interval according to symbol probabilities 3. Output a number
within the final interval Advantage: Can get arbitrarily close to entropy (better than Huffman for short
messages or skewed probabilities) Used in: JPEG2000, H.264, HEVC video codecs.

1.5 Run-Length Encoding (RLE)


RLE replaces RUNS of same symbol with (count, symbol) pair. Example: AAABBCCCC → (3,A)(2,B)(4,C)
Best for: Binary images (fax), simple graphics, executable files CCITT Group 3 fax uses modified Huffman
coding of run lengths.

1.6 Ziv-Lempel (LZ) Coding


LZ coding is a DICTIONARY-based method that finds repeated patterns. LZ77: Uses a sliding window as
dictionary. Encodes as (offset, length, next char). LZ78: Builds explicit dictionary. LZW (used in GIF,
TIFF): Modified LZ78 — starts with single characters in dictionary, adds phrases as we encode. GZIP uses
LZ77 + Huffman. It's one of the most widely used compression formats!

1.7 Waveform Characterization


Source models describe statistical properties: Stationary source: Statistics don't change with time
Ergodic source: Time average = ensemble average For quantization: Optimal quantizer
(Lloyd-Max): Decision boundaries and reconstruction levels that minimize MSE. Quantization SNR
for uniform quantizer with R bits: SNR ≈ 6.02R + 1.76 dB (increases ~6 dB per bit of resolution —
important formula!)
UNIT II: Predictive Coding

2.1 DPCM (Differential Pulse Code Modulation)


Instead of encoding the signal x(n) directly, encode the PREDICTION ERROR e(n) = x(n) - x_hat(n).
Since consecutive samples are correlated, the prediction error has SMALLER variance than the signal
→ needs fewer bits! Block diagram: x(n) → [Predictor x_hat(n)] → subtract → e(n) → [Quantizer] →
e_hat(n) → Output ↑___________________________________| At decoder: x_hat(n) = x_hat(n-1) +
e_hat(n) (simple 1st order prediction) DPCM gain: GDPCM = σ²x / σ²e (ratio of signal variance to
prediction error variance)

2.2 ADPCM (Adaptive DPCM)


The predictor and/or quantizer step size ADAPTS to the signal statistics. Adaptive quantizer: Step size
∆(n) adjusts based on recent error magnitudes. Large errors → increase ∆ (coarser quantization for fast
changes) Small errors → decrease ∆ (finer quantization for slow changes) ADPCM is used in: • G.726 (ITU
standard): 32 kb/s voice coding (telephone quality) • CD-quality audio → compressed to lower rates

2.3 Motion Compensated Prediction for Video


Video frames are HIGHLY correlated — consecutive frames are very similar! Motion Estimation: Find
where each block in current frame came from in previous frame. Block matching: Move a block around in
reference frame to find best match. Motion Vector: (dx, dy) displacement of the block Types of frames: •
I-frame (Intra): Independent, no prediction (like JPEG image) • P-frame (Predicted): Coded as difference
from previous frame using motion vectors • B-frame (Bidirectional): Coded from both previous AND future
frames GOP (Group of Pictures): I-P-P-P-B-B-P-B-B-... pattern in MPEG
UNIT III: Transform Coding

3.1 Transform Coding Concept


Transform coding converts signal to a TRANSFORM DOMAIN where energy is CONCENTRATED in few
coefficients. Steps: 1. Divide signal into blocks (typically 8×8 for images) 2. Apply transform (DCT, DFT,
Wavelet) 3. Most transform coefficients are near zero → quantize roughly (or set to zero) 4. Only
transmit/store NON-ZERO coefficients Key: Transform DECORRELATES the data. Karhunen-Loeve
Transform (KLT) is theoretically optimal but requires knowing signal statistics. DCT ≈ KLT for natural
images (that's why JPEG uses DCT!)

3.2 DCT (Discrete Cosine Transform)


DCT-II (most common 'the DCT'): X(k) = (2/N)^0.5 · Σ c(k) · x(n) · cos(π·k·(2n+1)/(2N)) for k=0,1,...,N-1
Properties: • Real-valued (unlike DFT which is complex) • Energy compaction: Most energy in first few
coefficients • Used in JPEG, MPEG, MP3, H.264, H.265 JPEG compression process: 1. Convert
RGB → YCbCr (chroma subsampling [Link] or [Link]) 2. Divide into 8×8 blocks 3. Apply DCT to each
block 4. Quantize coefficients (larger step size → more compression → lower quality) 5. Zigzag scan
→ Run-length encode → Huffman code

3.3 Wavelet-Based Compression


Wavelets give MULTI-RESOLUTION decomposition — great for images with edges. 1D DWT: Split into
approximation (LL) and detail (LH, HL, HH) using filter banks 2D for images: Apply to rows then columns
→ 4 subbands JPEG 2000 uses wavelet (Daubechies 9/7 for lossy, 5/3 for lossless). Advantages over
JPEG: No blocking artifacts, progressive transmission, better quality at same bitrate.
UNIT IV: Digital Broadcasting Standards

4.1 Vector Quantization (VQ)


Instead of quantizing one sample at a time (scalar quantization), VQ quantizes VECTORS (groups of
samples) together. Codebook: Set of representative vectors (codewords) {c1, c2, ..., cN} Encoding:
For each input vector x, find the NEAREST codeword ci. Output: Just the INDEX of the nearest
codeword (log2(N) bits) LBG Algorithm (Linde-Buzo-Gray): Design optimal codebook by k-means
clustering. Advantage: Better compression than scalar quantization (can exploit correlations between
samples)

4.2 Fractal Image Compression


Fractal compression exploits SELF-SIMILARITY in images — parts of image look like scaled, transformed
versions of other parts. Process: 1. Divide image into small 'range' blocks (e.g., 4×4) 2. Search for larger
'domain' blocks that resemble each range block (after scaling) 3. Store the affine transform (scale, rotation,
offset) instead of pixels Advantage: Very high compression ratios, resolution-independent Disadvantage:
Very slow encoding time

4.3 Digital Broadcasting Standards


MPEG (Moving Pictures Experts Group): • MPEG-1: VCD quality (~1.5 Mb/s), MP3 audio • MPEG-2:
DVD, digital TV (DVB, ATSC), up to 15 Mb/s • MPEG-4: Streaming, mobile video; includes H.264 (AVC) •
H.265/HEVC: 2x better compression than H.264, used in 4K streaming • H.266/VVC: Latest standard, 2x
better than HEVC Audio: • MP3 (MPEG-1 Audio Layer III): 128 kb/s for near-CD quality • AAC: Better than
MP3 at same bitrate (iTunes, YouTube, mobile) • Dolby AC-3: 5.1 surround sound for cinema/DVD Key
standard for digital broadcasting: DVB (Europe), ATSC (USA), ISDB (Japan)
QUICK REVISION TABLE

Topic Key Points Formula/Keyword


Entropy Min avg code length H(X)=-Σp(x)log2(p(x))

Huffman Short code=frequent symbol Optimal prefix-free code

Arithmetic coding Encode whole msg as number Closer to entropy than Huffman

LZ77/LZW Dictionary-based; sliding window Used in GZIP, GIF

DPCM Code prediction error; smaller variance e(n)=x(n)-x_hat(n)

Motion compensation I/P/B frames; motion vectors Used in MPEG/H.264/H.265

DCT Energy compaction; real-valued Used in JPEG, MP3, H.264

Quantization SNR 6 dB gain per bit SNR≈6.02R+1.76 dB

JPEG 2000 Wavelet-based; no blocking Daubechies 9/7 wavelet

MPEG-4/H.264 Video streaming standard AVC = Advanced Video Coding

You might also like