Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Efficient Content Extraction In Compressed Images

W. Brent Seales C. J. Yuan Michael Brown

Computer Science Department


University of Kentucky
Lexington, Kentucky 40506

Abstract and bandwidth allocation) in compressed video [18].


I n many contexts it is desireable for users to be able The idea of processing data without decompressing
to extract content directly f r o m image and video data it is attractive and has been explored by researchers
via queries formed dynamically on-line. Queries over interested in real-time visual effects such as fading
precomputed features of an image set restrict the user’s and warping. Smith and Rowe [19] outline the ba-
access to what has been computed by an editor or su- sic approach for applying image operators to JPEG-
pervisor of the data. This work presents an approach compressed images. They show large speed gains for
to extracting patterns from images and video eficiently certain operations by avoiding decompression. Rao [3]
in order to support on-line queries directly o n the orig- formulates the equivalent of the convolution theorem
inal image data. Our technique relies on compressed- for the DCT. He shows that image operations such as
domain processing, where we perform content extrac- high-pass filtering can be done by manipulating the
tion o n image or video data that are compressed with DCT values directly. Shen and Sethi [17] perform edge
standard algorithms such as JPEG, M P E G or wavelet detection in the compressed domain, showing that a
schemes. coarse edge map can be obtained up to 20 times more
efficiently. In terms of video segmentation and con-
1 Introduction tent analysis, the problem of scene-change, or “cut”
The problem of searching for patterns in images detection, has been solved in the compressed domain
and video is central to providing users intelligent ac- [2]. There are also algorithms for resource bandwidth
cess to large image archives. In the context of on-line allocation [7] which rely on predictions using cues ex-
querying of the data, the computational efficiency of tracted from compressed video.
the search is a primary concern, and is traded off with The work of the Photobook project by Pentland et
the accuracy of the search. In this paper we describe al. [12, 111 uses compression schemes that preserve
a method for matching patterns over large sets of im- semantic content and are thus searchable. The com-
age data. There are two key features of our approach. pression algorithms are devised for particular kinds of
First, we search for patterns in the compressed domain imagery (e.g., faces), and within this context the Pho-
without attempting to impose a special-purpose (or tobook system demonstrates excellent performance.
domain-dependent) compression scheme. This allows The Photobook project’s central idea of extracting
integration with existing types of standards for com- content from data that has been compressed is a pow-
pression of large data archives, which are supported by erful one and follows the trend of operating as much
general-purpose and widely-available hardware. Sec- as possible in the compressed domain. The idea of
ond, we include the parameters of lossy compression “semantic preserving compression” distinguishes our
in the search process. These parameters can be con- work and the Photobook project from other image
trolled by an archive editor or the user, and allows a database schemes which use extracted cues but do not
measure of control over the tradeoff between search preserve all the semantics of the image data [9]. Our
time and the accuracy of the results. It is often possi- work seeks to extract content from data that have been
ble to obtain very accurate query results with signifi- compressed using standard algorithms such as JPEG
cantly degraded data. and MPEG, which are lossy but preserve image se-
Researchers have laid an important foundation for mantics. This differentiates our formulation from that
extracting low-level features, performing visual ef- of Photobook, where the compression algorithms are
fects, and doing low-level segmentation (cut detection tailored toward the specific content of the image. The

52
$10.00 0 1997 IEEE
0-8186-7981-6/97
Authorized licensed use limited to: K J Somaiya Inst of Engg and Information Tech. Downloaded on March 22,2010 at 02:36:55 EDT from IEEE Xplore. Restrictions apply.
distinction is similar to compression algorithms them- Our purpose is to search for patterns which can be
selves, where greater compression can sometimes be accurately and efficiently extracted from the data in
gained by using a special-purpose algorithm that per- ils compressed form. We derive a measure of match
forms very well for a particular class of images. from the set of principal components computed from
In Section 2 we formulate the mathematics of our an image set. It is interesting to apply the princi-
approach, relate that to the JPEG/MPEG compres- pal component method at a point after part of the
sion schemes, and summarize the implied requirements compression transformation, rather than in the pixel
and constraints. Section 3 brieflly describes the imple- domain. It is straightforward to show that the mea-
mentation of a testbed system and then shows exper- sure of match using the “distance in eigenspace” mea-
imental results. The results at tempt to quantify the sure for pattern matching with images is preserved
positive and negative aspects of using our approach in under linear, orthogonal transformations. This im-
practice. Finally, Section 4 provides a summary of our plies that the principal component method gives ex-
contributions. actly the same measure of match on transformed data
as on pixel-domain data. Perturbation of the mea-
2 Mathematical Formulation sure of match occurs as a result of the quantization
The goal is to perform efficient pattern matching in of the transformed data, which is the key part of the
an archive of data that is compressed. There are two loss that is introduced in transform-coded compression
obvious approaches. First the archive can be stored schemes.
in a form that is convenient for the search, either in We use the principal component method on
a custom format or as a secondary archive where key transform-coded data and expect exactly the same
features have already been extracted. Second, the raw, quality of classification results as when performing the
uncompressed pixel data can simply be searched with principal component method on the pixel data. This
classical techniques. We attempt to avoid the first ap- will be true if the transformation is orthogonal and
proach because it may be important to preserve the there is no subsequent quantization of the transformed
data in a canonical form independent of the types of data. Lossy coding methods, however, will also intro-
queries to be performed or the specific content of the duce perturbations from quantization in the principal
data. It may also be considered restrictive to limit components which are difficult to quantify. But we
searches only to a secondary, precomputed set of ex- expect matching measurements to be much more effi-
tracted features because this potentially restricts the cient to perform on the transform-coded data because
user’s ability to query the primary data directly. The (1) the decompression costs are skipped and (2) the
second approach, using raw data directly, is sure to be transform vectors are usually sparse as a result of the
computationally expensive and may be unsuitable for quantization and coefficient reordering.
on-line querying.
The key idea of this paper is to exploit a reasonable 2.1 Principal Components and JPEG
assumption about the form of the archived data: we We will use the JPEG standard as an example. We
assume that the archived data will be stored in com- have applied our approach to wavelet transformations
pressed form, using a transform-coded compression [6] and to index frames (I-frames) of MPEG video as
scheme. MPEG, JPEG and wavelet methods are all well [13]. The JPEG standard consists of the follow-
transform-coding schemes. This, assumption, in most ing key steps: (1) Pixel range shifting; (2) Discrete
cases, is very reasonable for large image and video Cosine transform (DCT) on 8 x 8 pixel blocks; (3)
archives. Given image data that are already com- DCT quantization; (4) DCT “zig-zag” ordering; ( 5 )
pressed, it is advantageous to process it directly with- Run-length coding; and (6) Entropy coding. Applying
out needlessly expending resources to recover pixel in- all the steps gives a totally-compressed JPEG image.
formation. Transforming image (databack to the pixel Motion JPEG is essentially a continuous set of JPEG
domain expands the data size and forfeits some of frames, and MPEG I-frames are similarly encoded.
advantages that are only present when dealing with Our algorithms operate on run-length-encoded
transform-coded data. Note that it is necessary, even- (RLE) vectors, which is the data after step (5). Very
tually, to recover the pixel information for a small little can be done with the data after step (6) since
number images for the purpose of end-user display. it is no longer byte-aligned. Refer to [20, 81 for more
Even so, many more images must be examined in a detail regarding the JPEG and MPEG compression
typical search, and over that large data set there is algorithms.
a substantial savings gain in avoiding or delaying the We develop the method of principal components
final decoding until search results are finalized. [5] by defining the input as a set of IC images {fl,

53

Authorized licensed use limited to: K J Somaiya Inst of Engg and Information Tech. Downloaded on March 22,2010 at 02:36:55 EDT from IEEE Xplore. Restrictions apply.
f2,. . . ,fk}, represented as vectors, where each image The distance between two vectors XI and xz is
of dimension n x m has N = n x m pixels. We treat an 2
image as a vector by placing the pixel values in scan- 11x1 - x2ll = ( x 1 - X2)'(Xl - x 2 ) (8)
line order. First we subtract from each input vector which becomes
the average value
- xi11 = llX11l2 + llX21l2 - 2 X I . X 2
2
. k
11x1 (9)
Eq. (9) relates the Euclidean distance between two
vectors x1 and xz to their cross-correlation, which is
of the input vector set. the term x 1 - x 2 . When llxlII = IIx211 = 1, Eq. (9)
Now the covariance matrix C of the mean-adjusted becomes
2
input vectors f is 11x1 - x 2 l l = 2(1 - X l . X 2 ) (10)
This shows that the squared distance between two in-
c N ~ N= U U ~ (2)
put vectors is a measure of the correlation between
where the elements of U are written as those vectors as defined by Eq. (7). Vectors that are
close to each other are highly correlated.
[fl -m, f2 -m, ..., fk -mINxk (3) Now suppose that the input to the eigenspance
method is a set of JPEG-compressed images {fl,
The principal components are the eigenvectors and
f2, . . . ,f k } generated from a set of intensity images
eigenvalues of C:
{fl, f 2 , . . . , f k } . w e assume that we have vectors at
&Ai = C Ai (4) step ( 5 ) in the JPEG algorithm. Thus the discrete co-
sine transform (DCT) coefficients represented by the
Although the dimension of C is N x N , its rank data between steps ( 5 ) and ( 6 ) are quantized, are in
is k and hence this decomposition produces IC non- zig-zag (frequency) order, and are run-length encoded
negative eigenvectors {AI,A2, . . . ,A,} and their as- (RLE vector). We perform the inverse quantization
.
sociated eigenvalues {A,, x 2 , . . , &}. w e let A be a directly to the RLE vector, which can be done with-
matrix whose rows are formed from the eigenvectors out run-length decoding or reordering the coefficients.
A%of C ,and we order the rows so that the first row Note that the particular coefficient order is not im-
is the eigenvector corresponding to the largest eigen- portant, only that a consistent choice be made. Our
value and the last row is the eigenvector corresponding performance results in Section 3 show the substantial
to the smallest eigenvalue:

["'I]
gains obtained by using dequantized RLE vectors.
The steps in computing the eigenspace for this set of
input vectors are the same as before. Using Eqs. (1)-
A= (5) ( 5 ) on the set of input vectors {?I, ?2, . .. ,&} we obtain
..
a set of eigenvectors { 9 1 ,9 2 , . ,*k} and their asso-
Ak ciated eigenvalues {$I, $ 2 , . ,$k}. ..
As before (Eq. 5 ) , we form the transformation ma-
Because C is real and symmetric, the eigenvectors are trix \-k from the eigenvectors of the covariance matrix:
complete and orthogonal, forming a basis that spans
the k-dimensional space.
The matrix of ordered eigenvectors A can be viewed
as a matrix that projects input vectors, adjusted by
the mean vector m, into eigenspace:

p = A(f - m) (6) The key observation in dealing with vectors that


Given two vectors that are projected into have been transformed with the DCT is that the DCT
eigenspace, the closer the projections in eigenspace, is a distance-preserving linear transformation:
the more highly correlated the original vectors [lo].
In particular, the normalized cross-correlation ~ 1 be-
2 llfl - fill = Ilfl - 211 (12)
tween two vectors of the same dimension is given by This means that we should be able to match pat-
terns in the transformed data, expecting the same
(7) measure of match to hold.

54

Authorized licensed use limited to: K J Somaiya Inst of Engg and Information Tech. Downloaded on March 22,2010 at 02:36:55 EDT from IEEE Xplore. Restrictions apply.
2.2 Quantization When IIE112/11C1/2is small we can bound the change in
The quantization step of mlost compression algo- the eigenvectors and eigenvalues of C. In particular,
rithms gives the largest reductions in data size. But since both C and E are symmetric, according to the
quantization causes data loss anld introduces errors be- Weilandt-Hoffman theorem [4]
tween a compressed image and the original. As noted
N
earlier, in the JPEG algorithm the DCT values are
quantized to compress the data. This quantization
i=l
is lossy and the question we rnust address is “HOW
does progressively larger, lossy quantization affect the where
N
structure of the eigenspace thait will be used to clas-
sify images?”. We want the structure of eigenspace IIEII=
F~ Ieij12 (21)
i,j=l
to be perturbed slowly and smoothly as a function of
increasing loss due to quantization. In this section, we and Xi is the ith eigenvalue of C and X i is the ith eigen-
show that there are tight bounds on how the eigenval- value of C. Thus the difference between the eigenval-
ues and eigenvectors of the covariance matrix C will ues of C and C is bounded by the order of the largest
change when we introduce perturbations due to quan- element of E, while the elements of E are bounded
tization. We have verified these bounds with empirical directly by the JPEG quantization matrix and scale
results [13]. factor.
Let the input to the classification method be a set The change in the eigenvectors of C is also bounded
of compressed, mean-adjusted images {fl, f2, . . . fk} by the magnitude of E. In particular, the angular
generated from a set of intensity images { f1, f2, . . . ,fk} difference between the perturbed unit eigenvector Ai
of dimension N = n x m. Quantization introduces from C and the eigenvector Ai from C satisfies the
errors into the vectors { f ~f,2 1 . .. ,f k } , and we model expression [11
the error in each element of f as
fqAi,Ai) 5 c
I1E112
-
SaPi
where c is a small constant, 1!9(Ai,Ai) is the angle be-
tween & and Ai in radians, and

gap; = min IXi - X j I (23)


Now we write the covariance matrix as i#j

is the absolute value of the gap between X i from C


and its nearest neighboring eigenvalue.
where U is written as When the case of duplicity of eigenvalues arises,
the expression for the bound on the perturbation in
eigenvectors is similar, and measures the change be-
tween subspaces rather than the angle between single
and each vector 6i represents the perturbation intro- vectors.
&
duced into as a result of quantization. We now let In summary, the change in the structure of
eigenspace as a function of the perturbations intro-
ANxk =[61 62 6k] (16) duced by quantization is strongly bounded and well-
behaved. The bounds are a result of the form of C,
so that which is a symmetric covariance matrix. These error
bounds guarantee that distance in eigenspace, which
C = UUT = (U + A)(UT+ A T ) (17) is the basis for classification, will be perturbed slowly
and will yield good classification results over a signif-
This expands to
icant range of quantizations.
C = UUT + UAT + AUT + AAT (18)
These results imply that adjusting the quantization
factor of compressed images allows the data size to be
which can be rewritten as reduced to a point that optimizes compression ratio
and classification rate. The ability to tune the data
c + ( U A +~ A U +~ AA,*) = c +E (19) loss versus classification performance is potentially

55

Authorized licensed use limited to: K J Somaiya Inst of Engg and Information Tech. Downloaded on March 22,2010 at 02:36:55 EDT from IEEE Xplore. Restrictions apply.
Figure 1: (top) This page of the Beowulf manuscript (folio 133 recto) was used as a training set for the Old
English letter forms “d” “P, “ea”, and “m”. Using these training images we detect letter forms on other pages
of the manuscript. (bottom) The result of the detection of the word “dream” as a combination of its individual
letters (folio 130 verso).

56

Authorized licensed use limited to: K J Somaiya Inst of Engg and Information Tech. Downloaded on March 22,2010 at 02:36:55 EDT from IEEE Xplore. Restrictions apply.
very valuable. As a result, data stored in archives 4 Summary
at fairly high quantization rates is likely to contain In this paper we have shown that significant
enough information to allow very efficient searching speedups are possible in solving the problem of match-
and matching with large gains in performance over ing patterns in compressed image data. The speedups
the same operations in the pixel domain. In addition, are available by operating on the data while it is com-
even at low compression rates it is still more efficient pressed. Without errors from quantization, which is
to operate in the compressed domain because classi- involved in most lossy-coding schemes, the classifi-
fication distances are equivalent,, decompression can cation results are exact. But even with significant
be avoided, and sparse data consumes less disk and levels of quantization, the eigenspace which serves
memory. as the classifier remains robust and performs well.
When the eigenspace projection dominates the com-
3 Experimental Analysis putation time, as is the case when a small template is
A prototype tool has been developed which imple- matched at all locations in a large image, we achieve a
ments the methods described in this paper. The tool speedup of a factor of 2 . In other experiments where
reads JPEG images or MPEG frames and partially de- a small number of image templates from each com-
compresses them, quantizes as dlesired and allows the pressed image are projected to eigenspace, the decom-
principal component method to be applied at various pression time plays a significant role as well and leads
places in the decompression pipeline. We are using to speedups on the order of a factor of five.
the tool to examine a variety of ,applications [16, 151.
Acknowledgments
In the experiment shown in Fig. 1, we use high
resolution images of the Beowulf manuscript supplied We gratefully acknowledge the support of the Na-
by the British Library [14]. The goal is to detect letter tional Science Foundation for this research under
forms and use the combination of letter forms in order grants IRI-9308415, CDA-9320179 and CDA-9502645.
to extract arbitrary words and phrases across the 77- References
page manuscript. Each page of the manuscript has
[l] E. Anderson, Z. Bai, C. Bischof, J. Demmel,
been digitized to a high resolution of detail, creating
J. Dongarra, J. Du Croz, A. Greenbaum, S. Ham-
a very large digital archive (approximately 14 MB per
marling, A. McKenney, S. Ostrouchov, and
page). D. Sorensen. LAPACK Users’ Guide, Second
The example in Fig. 1 creates a training set for the Edition. Society for Industrial and Applied Math-
letter forms “d” “r”, “ea”, and “m” in order to search
ematics (SIAM), Philadelphia, PA, 1995.
for the word “dream”. The top image was used as a
template base for constructing the training sets. The [2] F. Arman, A. Hsu, and M.-Y. Chiu. Image
bottom image shows the result of the search. Twenty processing on compressed data for large video
eigenvectors were used for each template. Fig. 2 gives databases. In Proceedings of the First ACM Inter-
the time savings as a function of the JPEG quality national Conference on Multimedia, August 1993.
factor of the data. The recognition task runs twice
as fast, giving the same classificaiion results as when [3] B. Chitprasert and K. Rao. Discrete cosine trans-
applied to the raw (uncompressed) data. In particu- form filtering. Signal Processing, 19(3):233-245,
lar for the quality values shown in Fig. 2 classifica- 1990.
tion results were identical. This supports the mathe-
matical discussion in Section 2 which argues that the [4] G. Golub and C. Van Loan. Matrix Computation.
eigenspace distance measures are robust over a large The Johns Hopkins University Press, 1984.
range of perturbations from quantization.
[5] R.C. Gonzales and R.E. Woods. Digital Image
In the case of this example, decompression time
Processing. Addison-Wesley, 1993.
is only a small part of the total computation time,
which is dominated by the search and projection into [6] W. Hu and W. B. Seales. Biorthogonal wavelets
eigenspace. In cases where a much smaller number and object recognition in the compressed domain.
of templates from the image are to be projected into Technical report, Computer Science Dept., Uni-
eigenspace, such as tracking in video frames, the de- versity of Kentucky, Lexington, Kentucky, 1997.
compression time plays a much larger role and begins
to dominate the cost. We have found in such cases [7] K. Lakshman and R. Yavatkar. An empirical eval-
that the computation times can be up to five times uation of adaptive qos renegotiation in atm net-
faster when operating on compressed data. works. In Proc. 6th Intl. Workshop on Network

57

Authorized licensed use limited to: K J Somaiya Inst of Engg and Information Tech. Downloaded on March 22,2010 at 02:36:55 EDT from IEEE Xplore. Restrictions apply.
Decompression Searching JPEG Quality Method Size
Time (sec) Time (sec) Factor (bytes)
5 6190 2 RLE 9393
6 6207 3 RLE 10672
6 6273 5 RLE 15084
7 6383 10 RLE 26741
8 6534 15 RLE 36612
9 6669 20 RLE 48171
10 6797 25 RLE 63328
19 7705 50 RLE 129601
24 8103 75 RLE 174351
66 11732 75 RAW 1571328

Figure 2: The timing results for the letter-form search of a page of the Beowulf manuscript show that search
times are up to twice as fast using the compressed (RLE) data.

and Operating System Support for Digital Audio remote access. Journal of the Association for
and Video, April 1996. Computers and the Humanities, 1997.
[8] D. Le Gall. MPEG: A video compression stan- [15] W.B. Seales, J. Lumpp, and M. Brown. Dis-
dard for multimedia applications. Communica- tributed wireless stereo reconstruction. In IEEE
tions of the ACM, 34(4):47-58, April 1991. Aerospace Applicataons Conference, 1997.
[9] D. Lee, M. Flickner, R. Barber, J. Hafner, [16] W.B. Seales, J. Lumpp, and C.J. Yuan. Model-
W. Niblack, and D. Petkovic. Indexing for based tracking in compressed video. In IEEE
complex queries on a query-by-content image Aerospace Applications Conference, 1997.
database. In Proc. Int. Conf. Pattern Recogni-
tion, pages 142-146, 1994. [17] B. Shen and I.K. Sethi. Direct feature extrac-
tion from compressed images. In Proc. Interna-
[lo] H. Murase and S. Nayar. Visual learning and tional Society f o r Optical Engr. (SPIE), Storage
recognition of 3-D objects from appearance. In- and Retrieval for Image and Video Databases IV,
ternational Journal of Computer Vision, 14:5-24, October 1996.
1995.
[18] B. Smith. Survey of compressed domain process-
[Ill A. Pentland, B. Moghaddam, and T. Starner. ing techniques. In Reconnecting Science and Hu-
View-based and modular eigenspaces for face manities in Digital Libraries, a Digital Library
recognition. In Proc. IEEE Computer Vision and Symposium at the Universitg of Kentucky, Oct
Pattern Recog., 1994. 1995.
[12] A. Pentland, R.W. Picard, and S. Scarloff. Pho- [19] B. Smith and L. Rowe. Algorithms for manipulat-
tobook: Content-based manipulation of image ing compressed images. IEEE Computer Graph-
databases. Int. Journal of Computer Vision, to ics and Applications, 13(5):34-42, Sept 1993.
appear, 1996.
[20] G. Wallace. The JPEG still-picture compres-
[13] W.B. Seales, M.D. Cutts, and W. Hu. Content sion standard. Communications of the ACM,
analysis of compressed video. Technical Report 34(4):31-44, April 1991.
265-96, Computer Science Dept., University of
Kentucky, Lexington, Kentucky, 1996.
[14] W.B. Seales, J. Griffioen, and R. Yavatkar.
Content-based multimedia data management and

58

Authorized licensed use limited to: K J Somaiya Inst of Engg and Information Tech. Downloaded on March 22,2010 at 02:36:55 EDT from IEEE Xplore. Restrictions apply.

You might also like