Efficient DCT Quantization Table Optimization
Efficient DCT Quantization Table Optimization
Quantization Tables*
Viresh Ratnakar Miron Livny
Unzverszty of Wzsconszn-Madison Cnzverszty of Wzsconszn-Madzson
C o m p u t e r Sczences D e p a r t m e n t C o m p u t e r S c i e n c ~ sD e p o r t m e n t
Modison, WI 5 3 7 0 6 M a d i s o n , WI 53706
Emazl: ratnakar@[Link] Emoil: m,ron@[Link]
Abstract
The Discret? Cosine Transform (DCT) is widely used in lossy image and video compression schemes
such as JPEG and MPEG In this paper we describe RD-OPT. an efficient algorithm for constructing
DCT quantization tables with optimal rate-distortion tradeoffs for a given image. The algorithm uses
DCT coefficient distribution statistics in a novel way and uses a dynamic programming strategy t o
produce optimal quantization tables over a wide range of rates and distortions I t ran he used to
compress images at any desired signal-to-noise ratio or compressed size.
1 Introduction
T h e Discrete Cosine Transform [ANR74] lies at the heart of most commonly used lossy image and video
compression schemes [PM93, MP911. T h e extent of compression achieved depends upon the coarseness
of quantization of the transform coefficients. The coarser the quantization, the lesser the entropy of the
quantized coefficients. But coarse quantization also leads to poor quality of the decompressed image. ’rhus,
the quantization table used directly determines the rate-distortion tradeoff. i e , the comprrssion-cluality
tradeoff
Several approaches have been tried in order to design quantization tables for particnlar distortioii or
rate specifications. T h e most coinnion of these is to use a drfault table and scale it up or down by a
scalar multiplier to vary quality and coniprrssion TVr have shown in [RL94]that this might not give the
best tradeoff possible. Other approaches include psycho-visual niodel based cpantizatioii [hP92,TVatS3],
rate-distortion model based quantization [Jai89], and stochastic optimization technicpies [MS93]
In this paper we present RD-OPT, an efficient algorithm for optimum quantization t,able design that
does not rely on visual or rate-distortion models The algorithm admits a widr range of quality ineasnres
(including [Link] PSNR)and produces quantization tables optimizing the tradeoff bet,wern quality
and compressed size. A key feature of the algorithm is that i t simultaneously optimizes quantization tables
over a wide range of rates and distortions.
Authorized licensed use limited to: The University of Manchester. Downloaded on April 19,2010 at [Link] UTC from IEEE Xplore. Restrictions apply.
333
Thus, the D C T orders the information-content of an image into parts with different degrees of visual im-
portance. ‘These parts can then be selectively discarded or stored with varying degrees of accuracy, for lossy
compression of the image. The compression-ratio increases as more and more information is thrown away.
DCT-based compression techniques typically allow the user to specify a table (called the quantization table)
that stipulates the level of accuracy with which each spatial-frequency component is to he stored.
We now briefly describe the basic steps used in DCT-based image compression and decompression, and
thereby present some notation. Details on various standards can be found in the standard documents.
Details on D C T itself can be found in [RYSO].
Let I be a W x H image with pixel values in the range [0 . . . M I . The DCT-based compression process
consists of the following steps.
1. The image is divided into 8 x 8 blocks. To each image block f , the DCT is applied to get an 8 x 8
block: f of DCT coefficients. Each coefficient represents the amount of a particular spatial-frequency
content in f. The lowest frequency coefficient (also called the DC: coefficient) is f ( 0 , O ) while the highest
frequency coefficient is f ( 7 , 7 ) .
2. An 8 x 8-matrix of integers Q, called the quantization table, is used to quantize the coefficients in f
to form f Q . For notational convenience, we number the 64 entries in each 8 x 8 image block and each
8 x 8 block of DCT coefficients in raster-scan order, and use this ordering t o refer t o individual entries
+
in the various blocks. Thus, f(u,v) is referred to as f [ 8 u U ] . Using this notation,
3 The block f~ is ent,ropy-coded. using (for example) Huffman codes to exploit similarities across blocks,
to give the compressed block E ( ~ Q[PM93].
) The sequence of these compressed blocks forms the
compressed image.
1. Each entropy-coded block E ( ~ Qis)decoded to get the corresponding block of quantized coefficients
fQ .
3 The two-dimensional Inverse Discrete Cosine Transform (IDC’IC), is applied to f’ t o get the decom-
pressed image block f’. ‘These decompressed blocks form the decompressed image I‘
Authorized licensed use limited to: The University of Manchester. Downloaded on April 19,2010 at [Link] UTC from IEEE Xplore. Restrictions apply.
334
This causes differences in pixel values between the original image block f and its approximation, the decom-
pressed image block f’ We refer to the mean-squared-error between f and 7 as dzstortzon in f caused by
Q and denote it by d(f,Q)
1 63
d(l,Q)= E (f[nI - f’[nI)’
n=O
The distortion D ( I ,Q) between an image I and its approximation I’ due to quantization by Q is similarly
defined as the mean-squared-error between corresponding pixel values. Clearly, D ( I ,Q) is the mean value
o f all d ( f , Q) over all constituent blocks f in I The distortion D ( I , Q) is used to judge the “quality” of 1‘,
in quality measures such as Signal-to-Noise Ratio (SKR) and Peak Signal-to-Noise Ratio (PSNR):
Here S is the mean-squared pixel value over I Higher distortion implies poorer quality and vice versa
Clearly, as entries in Q are made higher, D ( 1 . Q) tends to increase.
We can also define the distortion between the DCT coefficient-block f and its approximation f’ resulting
from quantization by Q ,as
Our approach to designing rate-distortion-optimal quantization tables exploits the following nice property
of the DCT.
4f.
Q!= d ( S , Q!
That is, the mean squared error in pixel-domain is the same as the mean squared error in DCT-domain
This can be seen as follows.
DCT(f) = f
IDCT(?) = f’
=+ DCT(f’) = f’
3 DCT(f-f’) = f-7, using linearity of DCT
j E
:
=, - f’[n])’=
(f[n] cFz,
- j i [ n ] ) ’ , since DCT preserves
(f[n] LZ norms
* d(f,Q) = d(f,Q)
This implies that D ( I ,&) can he split as a sum into distortions in various DCT coefficients. Let D n ( I ,q ) be
defined as
1
= -lrean{ti[nl
Dn(i,q)64 - P[~II’I,
Where i’[ii]= ( f ^ [ n ] / / qq). and the mean is taken over all t h e blocks in the image. Then
63
D ( 1 . Q! = D n ( I .Q[nl!
n=u
Since we are mainly concerened with the effects of different quantization tables on a given image. we
denote the distortion D ( I , Q ) simply as D ( Q ) Similarly, we denotr the distortion U , ( i , q ) in t,he nth
coefficient by D,(q) Then,
63
D(Q) = 1DiL(Q[nl)
n=O
(1)
This decomposit,ion of D(Q)into contributions from individual coefficients is important, as it simplifies the
task of minimizing D ( & ) to that of minimiaing a sum. each of whose components depends on just one entry
in Q. We now show how the rate (or compressed size) resulting from Q can also be split similarly into a
sum These two properties of the DCT allow the problem of optimizing the rate-distortion tradeoff to be
solved using a dynamic programming approach for miiiiiiiizing two sums.
Authorized licensed use limited to: The University of Manchester. Downloaded on April 19,2010 at [Link] UTC from IEEE Xplore. Restrictions apply.
335
Where the entropy is measured over all the blocks in the image'. Then
63
N Q )KZ Rn(Q[nl)
n=O
Thus, R ( Q ) can also be decomposed into a sum of contributions from individual coefficients, just like D ( Q ) .
1. Given a target distortion A, find Q such that D ( Q ) 5 A and the rate R(Q)is minimized.
2. Given a target rate B bits per pixel (bpp), find Q such that R(Q) 5 B and the distortion D ( Q ) is
minimized.
We call the quantization tables Q that satisfy these conditions (for some A or B ) ED-optimal.
RD-OPT takes an image I as input and produces RD-optimalquantization tables for a wide range of rates
and distortions The contributions to rate and distortion of individual coefficients (Rn(&[n]) and &(Q[n]),
respectively) just depend on the entry Q[n]of 4'. RD-OPT first calculates R n ( Q [ n ]and ) Dn(Q[n])for
each possible value of Q [ n ] and
, then uses a dynamic programming approach t o minimize sums of Rn(Q[n])
against sums of D, (&[.I). To calculate R,(y) and Dn(q)for each possible y, a preliminary pass through the
image is run to gather DCT statistics which are used in a novel way. RD-OPT can be described at a high
lpvrl as.
'If ( f [ n ] j qtakes
) thc value ZI in a fraction p , > 0 of all blocks f , then this entropy is -E,
p,logzp,.
Authorized licensed use limited to: The University of Manchester. Downloaded on April 19,2010 at [Link] UTC from IEEE Xplore. Restrictions apply.
336
Algorithm RD-OPT
Input: An image I of width W and height H , with pixel values in the range [0 . . M ]
Step 2. Use the statistics to calculate R,(q) and D,(q) for each possible p (Procedures F i l l R and F i l l D ) .
W’e now present each of the three steps in RD-OPT in detail. Pseudo-code for all the procedures described
in this section can he found in Appendix A .
1. How many times does the nth coefficient get quantized to value zi when Q[n] = q?
2. What is the mean-squared error for the nth coefficient when &[n] = q?
c / / q = Bucket(c)//(Zp).
Hence, it suffices to gather statistics by counting the number of times each D C T coefficient takes a value in
a particular bucket, as this count can then be used to calculate the number of times a particular quantized
coefficient takes a particular value. The unquantized coefficient value itself can he approximated to within3
ztO.25.
Let OccursCount[O . .63][-2VMAX ZVMAX] be an array with the value OccursCount[n][v] being the num-
ber of blocks where the nth D C T coefficient en is such that Bucket(c,) = U . T h e constant VMAX is the
maximum absolute value any D C T coefficient can take (for 1-byte samples, M = 255 and VMAX = 2048). T h e
array OccursCount is filled by the procedure G a t h e r S t a t s .
G a t h e r S t a t s works by going through each 8 x 8 block f in I and calculating its Discrete Cosine Transform
g . For each coefficient g [ n ] ,the count OccursCount[n][Bucket(g[n])]is incremented by one.
Authorized licensed use limited to: The University of Manchester. Downloaded on April 19,2010 at [Link] UTC from IEEE Xplore. Restrictions apply.
337
OccursCount[n][v] such that vll(2.q) is equal t o QuantizedVal. If F is the total number of blocks in I , then,
the entropy is calculated as:
Count Count
Entropy = - F l o g ’ 7
QuantizedVal
F i l l D fills the array D by calculating the error in quantizing the nth coefficient by q , for each n and q . For
each integer t i in the range -2VMAX. . .2VMAX, the nth coefficient gets quantized to the value QuantizedVal
(= v i ( 2 9 ) ) in OccursCount[n][v] blocks. The actual (unquantized) value of the coefficient in each of these
blocks is estimated by the variable OriginalVal., to within 1k0.25. For each U , the error is incremented by
OccursCount[n][v] , (OriginalVal - q QuantizedVal)’.
f:
k=O
R[kl[Q[kll= s
Theorem 1 states the property of LeastD that allows it to be computed using a dynamic programming
approach.
Theorem 1 For each n, I 5 n 5 63, and each 8, 0 5 s 5 MAXRATE, k t D ( n ,s) be the set
1 5 (I 5 MAXQ,
D[n][q] + [Link][n - 1][s‘]
Then,
Authorized licensed use limited to: The University of Manchester. Downloaded on April 19,2010 at [Link] UTC from IEEE Xplore. Restrictions apply.
338
63rd row. To recover the desired quantization table Q from that point, procedure Recoverq is used. This
procedure recovers the quantization table Q for target rate s by setting Q[63] to QChoice[63][s] and then
working its way up the rows as follows. For going from row number n to row number n 1, s is decremented
~
4 Analysis of RD-OPT
In this section we discuss the running time of RD-OPT and the errors resulting from discretization
4.1 Complexity
G a t h e r s t a t s runs in time about that required to apply the D C I once to each block in the image. F i l l D
and F i l l R each run in time less than a constant times 64 x MAXQ x VMAX. F i l l L e a s t D runs in time less than
a constant times 64 x MAXQ x MAXRATE. In any practical implementation, these times can be substantially
reduced. T h e necessary optimizations are straightforward and are omitted here for clarity.
In the most common case, RD-OPT will be asked to optimize PSNR For M = 255, the reported PSNR, P,,
will be:
2552
P,= 10log,, ~
D ( I e .Q )
Hence, the actual PSNR, P ,is bounded as follows:
P,- 2010g10(l+ ,(PE)) 5 P 5 P,- 2010g10(l - a ( P c ) ) ,
where m(Pc)= 1020
T h e total error in R ( Q ) is a t most
64 x o ~/BPPSCALE
plus the error in estimating rate as the coefficient-wise sum of entropies. T h e latter component goes down
as R ( Q )increases, and is typically around 0.02bits per pixel [RFVK94].
5 Performance results
We have implemented RD-OPT in C on various platforms. Figure 1 shows the performance of RD-OPT
running on an IBM POWERstation 370. The test image used was the well-known 512 x 512 grayscale image
of Lena. T h e figure tabulates running times and accuracy of results for various values of BPPSCALE. In each
case, RD-OPT was asked to compute optimal tables over the range 0 0 to 1.0bits per pixel. These tables
were used to compress the image using the Independent J P E G Group's J P E G compression software. T h e
table shows the actual rate and actual PSNR when Lena was compressed using the quantization table picked
by RD-OPT with a target rate of 0.8 bits per pixel. T h e scaled default J P E G table gave a PSNR of 34.90
dB a t 0 796 bits per pixel.
Authorized licensed use limited to: The University of Manchester. Downloaded on April 19,2010 at [Link] UTC from IEEE Xplore. Restrictions apply.
339
=SCALE Running time Predicted R a t a Actual Rate Predicted PSNR Actual PSNR
(secs.) (bits per pixel) (bits per pixel) (dB)
1000 0.800 0 822 35 809
2!iOO 0.800 0.801 35.726 35.714
0.800 0.798 35.708 35.695
7500 37.6 0.800 0.796 35.705 35.694
10000 0.800 0.795 35.701 35.690
12500 56.0 0.800 0.794 35 700 35.687
15000 G4.6 0.800 0.794 35.699 35.688
20000 82.2 0.800 0.795 35.699 35.687
50000 189.8 0.800 0.794 35.697 35.686
T h e running time in each case includes the time spent in G a t h e r S t a t s , F i l l D , and F i l l R , which is
indcpendeiit of BPPSCALE, and is about 10.91 seconds. Out of 10.91 seconds, about 2.5 seconds are spent
in G a t h e r S t a t s , and the rest in F i l l D and F i l l R . These procedures haven't been optimized in the current
implementation, and we expect to bring the running time down even more, in the near future.
We have used RD-OPT on a wide variety of images, with similar results.
References
[ANR74] Ahmed, N . , Natarajan, T., and Rao, K . R. Discrete Cosine Transform. IEEE Trans. Computers,
C--2390-3,Jan. 1974.
[APSS] Ahumada Jr., A . J. and Peterson, H. f1. Luminance-Model-Based DCT Quantization for Color
Image Compression. Human Vzszon, Vzsual Processzng, and Dzgztal Dzsplay III, B. E'. Rogoulztz, ed.
(Proceedings of the SPIE), 1992.
[Jai89] Jain, A. K . Fundamentals of Digztal Imuge Processzng. Preritice Hall, Englewood Cliffs, NJ, 1989,
[MPBl] MPEG I draft. Coding of Moving Pictures and associated audio for digital storage, 1991. Document
ISO/IEC-CD-11172.
[MSSS] Monro, D . M . and Sherlock, B. G. Optimum DCT Quantization. Proceedzngs of Data Compresszon
Conference, pages 188-194, 1993.
[PM931 Pmnebaker, W. B . and Mitchell, J . L . JFEG Still Image Data Comprrsszon Standard. Van Nostrand
Reinhold, New York, 1993.
[RFVK94] Ratnakar, V., Feig, E., Viscito, E., and Kalluri, S . Runlength encoding of quantized DCT
coefficients IBA4 R C 19693 (87318) 8/5/94 (To appear zn SPIE '95), 1994
[RL94] Ratnakar, V . and Livny, M. Performance of Customized DCT Quantization Tables on Scientific
Data. Science In,formatzon Management and Data Compression Workshop Proceedzngs, NASA
Conference Puhlzcation 3277, pages 1-8, Sept 1994.
[RYSO] Rao, K . R. and Yip, P. Dzscrete Cosznc Transform. Algorzthms, Advantages, Applicatzons Aca-
demic Press, Inc, San Diego, California, 1990.
[VB67] Van Ness, F . I. and Bouman, M . A . Spatial Modulation Transfer in the Human Eye. Journal of
the Optical Society of Amerzca, 57(3):40 (-406, March 1967.
[WatSS] Watson, A . B. DCT quantization matrices visually optimized for individual images. Human Vwion,
Visual Processzng, and Dzgztal Dzsplay I V , B. E. Rogowztz, ed. (Proceedzngs of the SPIE), 1993.
Authorized licensed use limited to: The University of Manchester. Downloaded on April 19,2010 at [Link] UTC from IEEE Xplore. Restrictions apply.
340
A RD-OPT Procedures
Procedure Gatherstats
Input: Image I
Output: Array OccursCount[O..631[-2VMAX..ZVMAX]
Procedure FillR
Procedure FillD
Authorized licensed use limited to: The University of Manchester. Downloaded on April 19,2010 at [Link] UTC from IEEE Xplore. Restrictions apply.
341
Procedure F i l l L e a s t D
/* I n i t i a l i z a t i o n s */
1 . For n := 0 t o 63
2. For s := 0 t o MAXRATE
3. LeastD En] [SI := INFINITY
/* F i l l row number z e r o */
4 . For q : = 1 t o MAXQ
5 . I f ( D CO1 [ql < L e a s t D CO1 [R CO1 Cql1 then
6. LeastDCOl [RCOl [qll := DE01 [ql
7. QChoiceCOl CRCOl [ q l l := q
/* Main loop */
8 . For 'n := 1 t o 63
9. FOK q : = 1 t o MAXQ
10. .For s ' := 0 t o MAXRATE
11. If (D[nl [ql + LeastDIn-11 [s'] .< LeastDCnl [ s ' + RCnl c q l l ) Then
12. s := s ' + RCnlCql
13. LeastDCn] [SI := DCnl Cql + Le,astDCn-11 Cs'l
14. QChoiceCnl [SI := q
Procedure Recover9
1. For n : = 63 downto 0
2. QCnl : = QChoiceCnl [SI
3. s : = s - R[nl[Q[nll
B Proof of Theorem 1
Suppose D ( n , s ) is empty or m i n D ( n , s ) is 03. 'Then, for every y (1 5 y 5 MAXQ), either R[n][p] > s , or
LeastD[n - l][s - R[n][q]] = 00. In either case, the rate s cannot be achieved from coefficients 0 through n,
implying LeastD[n][s] = W .
Now assume D ( n ,.s) is non-empty and that d IS the minimum value in D ( n , s), achieved by setting &[n]
to q . Assume LeastD[n][s] = d' < d . Then the distortion d' must be achieved with some value, say q' for
&[.I. Let d" = d' - D[n][q']. Then the distortion d" must be achievable from coefficients 0 through n - 1,
with thcir rate being exactly equal to s - R[n][q']. But then, d" = LeastD[n - 1][s- R[n][q']], as otherwise d'
+
can be improved, contradicting d' = LeastD[n][s]. Hence d' = D[n][q'] LeastD[n - 1][s - R[n][q'] implying
d' E D ( n ,s). Thus, d 5 d', which conhadicts d' < d . 0
Authorized licensed use limited to: The University of Manchester. Downloaded on April 19,2010 at [Link] UTC from IEEE Xplore. Restrictions apply.