Image Processing For Paper

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 10, NO.

7, NOVEMBER 2008

1277

Document Image Processing for Paper


Side Communications
Paulo Vinicius Koerich Borges, Student Member, IEEE, Joceli Mayer, Member, IEEE, and
Ebroul Izquierdo, Senior Member, IEEE

AbstractThis paper proposes the use of higher order statistical


moments in document image processing to improve the performance of systems which transmit side information through the
print and scan channel. Examples of such systems are multilevel
2-D bar codes and certification via text luminance modulation.
These systems print symbols with different luminances, according
to the target side information. In previous works, the detection of
a received symbol is usually performed by evaluating the average
luminance or spectral characteristics of the received signal. This
paper points out that, whenever halftoning algorithms are used
in the printing process, detection can be improved by observing
that third and fourth order statistical moments of the transmitted
symbol also change, depending on the luminance level. This work
provides a thorough analysis for those moments used as detection
metrics. A print and scan channel model is exploited to derive
the relationship between the modulated luminance level and the
higher order moments of a halftone image. This work employs
a strategy to merge the different moments into a single metric
to achieve a reduced detection error rate. A transmission protocol for printed documents is proposed which takes advantage
of the resulting higher robustness achieved with the combined
detection metrics. The applicability of the introduced document
image analysis approach is validated by comprehensive computer
simulations.

I. INTRODUCTION

RINTED paper and its corresponding document image


processing is critical for information storage, presentation
and transmission of analogue and digital information. In addition to conventional text and images, paper communications include, for example, bar codes, bank and identity documents, and
hardcopy text certification.
Regarding bar codes, multilevel 2-D bar codes [1], [2] have
gained increased attention in the past few years. Instead of representing information with only black and white symbols, multilevel codes use gray levels to increase the symbol rate, as illustrated in Fig. 1(a). Consequently, a higher capacity version of
Manuscript received March 15, 2007; revised May 28, 2008. Current version
published November 17, 2008. This work was supported by CNPq, Procurement
202288/2006-4. The associate editor coordinating the review of this manuscript
and approving it for publication was Dr. Xiaolin Wu.
P. Borges is with the LPDS, Department of Electrical Engineering, Federal
University of Santa Catarina, Florianpolis, Brazil 88.040-900 and also with
the Multimedia and Vision Lab., Department of Electronic Engineering, Queen
Mary University of London, London E1 4NS, U.K. (e-mail: vini@ieee.org,
paulo.vinicius@elec.qmul.ac.uk).
J. Mayer is with the LPDS, Department of Electrical Engineering, Federal
University of Santa Catarina, Florianpolis, SC, Brazil 88.040-900 (e-mail:
mayer@eel.ufsc.br).
E. Izquierdo is with the Multimedia and Vision Laboratory, Department of
Electronic Engineering, Queen Mary University of London, London E1 4NS,
U.K. (e-mail: ebroul.izquierdo@elec.qmul.ac.uk).
Digital Object Identifier 10.1109/TMM.2008.2004906

Fig. 1. Illustration of side communications over paper. (a) Multilevel 2-D bar
code. (b) Example of text certification through luminance modulation.

1-D bar codes is achieved. Examples of their utilization include


representing encrypted information or serving as an auxiliary
verification channel. A discussion regarding more applications
and aspects related to coding/decoding of 2-D bar codes is given
in [1], [2].
With respect to hardcopy text certification, several techniques
have been proposed in the literature. Brassil et al. [26] propose and discuss several methods to embed and decode information in documents, which can survive the print and scan (PS)
channel. In one of the methods, called line-shift coding, a line
is moved up or down according to the bit to be embedded. In
order to perform blind detection, line centroids can be used as
references. One disadvantage of this method is that the centroids
must be uniformly spaced, which does not always occurs in documents. Variations of the method include character and word
and character shift coding [27], [23], [25], but they are essentially different implementations of the fundamental idea. Unfortunately, the line and word shifting techniques assume predictable spacing in the document. Equations, titles, and variable size logos or symbols complicate the coding process due
to nonuniform spacing.
An alternative class of methods (called pixel flipping) performs modifications on the characters pixels [28], [13], such as
flipping a pixel from black to white, and vice versa. In [30],
for example, the modifications are performed according to the
shape and connectivity of the characters. In weak noise conditions, this types of methods present a very high information embedding rate, but since the method relies on small dots, when
the PS distortions are considered, they required very high resolutions in both printing and scanning to reduce detection errors.
For these pixel flipping techniques, an useful detection statistic
when the signal is submitted to the PS process is proposed in
[31], based on the compression bit rate of the modified signals.
Another important technique is called text luminance modulation (TLM) [3][5]. It slightly modulates the luminance of
text characters to embed side information. This modification is
performed according to the target side information and can be

1520-9210/$25.00 2008 IEEE

1278

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 10, NO. 7, NOVEMBER 2008

set to cause a very low perceptual impact while remaining detectable after printing and scanning. An example of this technique is given in Fig. 1(b), where the intensity changes have
been augmented to make them visible and to illustrate the underlying process. In contrast to some of the limitations of the
methods detailed above (nonblind detection, need for uniform
spacing between lines, errors from segmentation, etc), TLM can
be designed to be more robust to these issues, as discussed in [6].
Due to the usefulness of TLM and multilevel 2-D bar codes,
this work focuses on improving the detection in such systems.
The contributions presented in this paper are manyfold.
1) In TLM and multilevel 2-D bar codes, extraction of the
embedded side information is usually performed by evaluating the average amplitude [1], [5], [6] or spectral characteristics [5] of the region of interest. However, considering that halftoning is usually employed in the printing
process, other statistics of the received signal can also be
exploited in the detection. One example is the sample variance, which can be effectively used as a detection metric,
as proposed in [3]. In this work, higher order statistical moments such as skewness and kurtosis are used as detection
metrics, extending the work presented in [3].
2) The provided analysis shows the relationship between the
modulated luminance and the higher order statistics of a
printed and scanned halftone region. Statistical assumptions related to the 1st and 2nd order moments and to the
PS channel model are based on [3]. This model includes
the halftone quantization characteristics of the halftoning
process which are exploited here to derive the high order
statistics based detection metrics.
3) Robustness against PS distortions and consequently reduced detection error, by combining the proposed metrics
into a single highly efficient metric. This unified model allows previously proposed metrics [5], [6] to be combined
with the ones proposed in this paper.
4) A practical protocol for the certification of printed documents is proposed which exploits the resulting high robustness of the proposed unified metrics.
This paper is organized as follows. Section II describes the
halftoning process and a PS model, which can be seen as a
noisy communications channel. Section III analyzes the relationship between the different moments and the modulated average luminance before PS. Section IV performs a similar analysis considering the PS distortions. Section V proposes that the
discussed metrics be combined into a single metric using the
Bayes classifier. Section VI proposes an authentication protocol
for printed text documents. Experimental results are presented
in Section VII, followed by conclusions in Section VIII.
II. PRINT AND SCAN CHANNEL
A. The Halftoning Process
Due to limitations of most printing devices, image signals are
quantized to 0 or 1 (where represents white and represents
black in this paper) prior to printing. The output 0 represents a
white pixel (do not print a dot), and 1 represents a black pixel
(print a dot). A halftone image (binary) is generated from an

original image and this quantization is performed according to


a direct comparison between the elements in a dithering matrix
and the elements in . A detailed description of halftoning algorithms can be found in [7], [8]. The binary signal is included
in the PS channel model described in the next section.
B. Print and Scan Channel
A number of analytical models of the PS channel have been
presented in the literature [1], [2], [9], [6]. In general, the proposed models of the PS channel assume that the process can be
modeled by low-pass filtering, the addition of Gaussian noise
(white or colored), and nonlinear gains, such as brightness and
gamma alteration. Geometric distortions such as possible rotation, rescaling, and cropping may also occur, but they are assumed controled as they can be compensated for.
The contributions presented in this work are directly related
to the effects caused by the halftoning process. For this reason,
a PS model which includes the halftoned signal is employed
to describe the process. A detailed description of this model is
given in [3], where the PS operation is described by

(1)
In this equation, represents the microscopic ink and paper imperfections and is a noise term combining illumination noise
and microscopic particles in the scanner surface. The term
represents a blurring effect combining the low pass effect due
in (1) repreto printing and due to scanning. The term
sents a gain in the printing process and the term
represents
the response of scanners.
In the model in (1), represents the halftone signal, generated
from an original signal , as described in Section II-A.
III. EFFECTS INDUCED BY THE HALFTONE
The halftoning algorithm quantizes the input to print a dot
and do not print a dot according to the dither matrix coefficients. This quantization causes a predictable effect on the variance, skewness, and kurtosis of a halftoned region, according
to the input luminance. For the variance, this is dependence is
discussed in [3]. In the following this approach is extended to
the skewness and the kurtosis, to help improve the detection in
printed communications.
A. Skewness
The skewness measures the degree of asymmetry of a distribution around its mean [14]. It is zero when the distribution is
symmetric, positive if the distribution shape is more spread to
the right and negative if it is more spread to the left, as illustrated
in Fig. 2(a).
The skewness of a halftone block of size
is given by
(2)
where
and
is the number of coefficients in the dithering matrix

BORGES et al.: DOCUMENT IMAGE PROCESSING FOR PAPER SIDE COMMUNICATIONS

1279

(5)
as illustrated by the area in Fig. 3. Substituting this result into
(3) and (4), yields
(6)
(7)

Fig. 2. Illustration of the effect of positive and negative skewness and kurtosis.
(a) Skewness. (b) Kurtosis.

Fig. 3. Representation of the uniform distribution assumed for the coefficients


of and the distribution of .

where
and
represent respectively the skewness
and the kurtosis of a halftoned block that represents a region of
constant luminance .
In [3], an analysis similar to the above is presented for the
variance, and it is shown that with the same assumptions the
is given by
variance
(8)
C. Comments on

Fig. 4. Distribution of the noise

Since
written as

.
, and (2) can be

(3)

B. Kurtosis
The kurtosis is a measure of the relative flatness or peakedness of a distribution about its mean, with respect to a normal
distribution [14]. A high kurtosis distribution has a sharper peak
and flatter tails, while a low kurtosis distribution has a more
rounded peak with wider shoulders, as illustrated in Fig. 2(b).
The kurtosis of a halftone block of size
is given by

(4)

and
as a function of the input luminance
must be generated from a constant gray level
, where
region, that is,
is a constant. Assuming that
is approximately uniformly
in Fig. 3, the probability of
distributed as illustrated in
, which is
, is given by
To derive

and

The halftone signal is binary and it is distributed according


, with probabilities
and , respecto
tively, as illustrated by in Fig. 3. Because the skewness and
the kurtosis of depend on , these moments can be used as
detection metrics in text luminance modulation and multilevel
bar codes, in addition to the average and variance metrics employed in [3].
and
Regarding the skewness, it is equal to zero when
the distribution of is symmetric, represented by two peaks of
equal probability. The two symmetric peaks also flatten the distribution of in (1), minimizing the kurtosis. When
is composed of more white dots than black dots, leaning the
distribution of to the left and causing a positive skewness.
, yielding a negative skewThe opposite occurs when
ness. Likewise, the distribution of becomes more peaky as
approach the limits of the luminance range, consequently increasing the kurtosis.
IV. EFFECTS INDUCED BY THE PS CHANNEL
The statistical moments described in (3) and (4) are affected
by the low-pass characteristic and the noise in the PS channel.
and
are derived in
Considering these channel distortions
Sections IV-C and IV-D, respectively. Statistical and distortion
assumptions for the analyses are discussed in Section IV-A. For
coordinate system is mapped to a 1-D
simplicity, the
notation.
A. Statistical and Distortion Assumptions
. The noise
is
In the model in (1), let
zero-mean with variance given by
, and it is distributed
, as illustrated in Fig. 4.
according to
Although in (1) is generally defined as nonlinear, in many
devices it can be approximated to a linear model [1]. This is
particularly more reasonable in a TLM application, because dedue
tector operates in a small range of the luminance range

1280

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 10, NO. 7, NOVEMBER 2008

to the low perceptual impact requirement. For this reason, is


assumed linear and in is approximated to 1 for simplicity.
is generated from a constant gray level
Assuming that
region, that is,
, (1) can be written as

(9)
represents a gain [see
in (1)] that varies
The term
slightly throughout a full page due to nonuniform printer toner
is modeled
distribution. Due to its slow rate of change,
as constant in but it varies in each realization satisfying
, where represents the th symbol of a 2-D
bar code or the th character in TLM.
Due to the nature of the noise (discussed in Section II) and
based on experimental observations, and can be generally
modeled as zero-mean mutually independent Gaussian noise
[1], [11], [2].

where

is described by (10) and

is given by

(14)

D. Kurtosis
The sample kurtosis of a scanned symbol is given by

B. Variance
Based on the assumptions described above, it is shown in [3]
that the sample variance of a scanned symbol is given by
(10)
In the following an extension to the skewness and to the kurtosis is presented.

(15)
Appendix II derives the term
above, yielding

in equation

C. Skewness
The sample skewness of a scanned symbol

is given by

(16)
where

is given by
(17)

(11)
V. COMBINING THE METRICS
and
are zero-mean mutually indepenRecalling that
dent random variables and that third order moments of independent and identically distributed zero-mean random variables are
zero, (11) becomes
(12)
Appendix I derives the term
above, yielding

in equation

(13)

Extending the approach which combines the metrics average


and variance discussed in [3], it is also possible to combine
the skewness and the kurtosis along with another metrics of
a received symbol into a single metric to reduce the detection
error rate. Considering a stochastic interpretation of the detection metrics, the result of each metric is approximately normally
distributed. For this reason, in this work the Bayes classifier [15]
is employed to combine the metrics, due to its optimal properties
for normally distributed patterns [15]. Results of the detection
using the Bayes classifier are given in Section VII. The reader
is referred to [3] for an example on how to apply this classifier
in the scenario discussed in this paper.
Although some detection metrics have better performance
than others, because all the first four statistical moments are

BORGES et al.: DOCUMENT IMAGE PROCESSING FOR PAPER SIDE COMMUNICATIONS

1281

useful to separate classes, combining them increases the distance between classes, and consequently reduces the detection
error rate [16], at the expense of increasing computational
complexity. It is also possible to combine useful spectral or
other nonstatistical metrics, although this is not discussed in
this paper.
VI. PRACTICAL AUTHENTICATION PROTOCOL
Taking advantage of the reduced error rate provided by combining the detection metrics, a practical protocol for document
authentication based on TLM [5], [6] is proposed. It is similar
to the system proposed in [33], but with an alternative detection
method. Instead of employing TLM as a side message transmitter, the modulated characters [see Fig. 1(b)] are used to ensure that no character in the document has been altered. This is
achieved by combining cryptography, optical character recognition (OCR) [21], [22], and the detection of characters with modulated luminances, discussed in this paper. Notice, however, that
the proposed system can also be used to authenticate digital documents that are not subject to the PS channel.
The proposed framework for authentication scrambles the binary representation of the original text string with a key that
depends on the string. The resulting scrambled vector is used to
create another vector of dimension equal to the number of characters in the document. This is used as a rule to modulate each
character individually, as illustrated in Fig. 1(b).
A related approach for image authentication in which a digital watermark [10] is generated with a key that is a function of
some feature in the original image has been proposed in the
literature, as in [19], [20], [24], for example. To avoid that be
modified by the embedding of the watermark itself, hence frustrating the watermark detection process, only characteristics of
a portion of the image must be used. It is possible, for example,
to extract features from the low-frequency components, and to
embed the watermark in the high frequency components, as discussed in [10].
In contrast, in the authentication system proposed here, the
modified characters luminances do not alter the feature used to
generated the permutation key, which are the characters meanings. The system is described in the following. An illustrative
diagram of the encryption process in given in Fig. 5.
A. Encryption
Let vector
of size represent a text
string with characters.
represent the luminances
Let vector
, respectively.
of characters
(
, for example),
Let
where has cardinality .
be the binary representation of symbol .
Let
Let be the binary representation of , where has size
.
Let
be a function of . is used as a key to
generate a pseudo-random sequence (PRS) , such that the
PRSs are ideally orthogonal for different keys .
, where represents the exclusive or
Let
(XOR) logical operation.

jj

Fig. 5. Encryption block diagram. Block s/b represents string-to-binary conbits. The
version. Block represents a mapping of c from c bits to
symbol represents the exclusive or (XOR) logical operation.

be a function that maps , with


bits, to another
Let
bits.
vector , with
In order to provide security, is encrypted with the private
key of a public key cryptosystem [17]. Public key cryptosystems
use two different keys, one for encryption, , and one for decryption, . The private key is only available for users who
are allowed to perform the authentication process. On the other
hand, anyone can have access to the public key to only check
whether a document is authentic, without the ability to generate
a new authenticated document.
be the encrypted version of based on the key ,
Let
using the a public key encryption scheme such as the RSA [17],
for example. To authenticate the text document, vector (which
represents the luminances of the characters in the document) is
. Therefore, the document is authenmodified such that
ticated by setting the luminance of each character equal to .
B. Decryption
In the verification process, OCR is applied to the printed document. In addition, the luminance of each character is determined using the metrics proposed in this paper. Therefore, when
testing for the authenticity of the document one has access to a
received and a received , where and represent the received
vectors and , respectively. It is assumed that the conditions
are controlled such that no OCR or luminance detection errors
for decrypoccur. Moreover, one has access to a public key
tion in the RSA algorithm and a scrambling key
,
which depends on .
into
Using the public key , it is possible to decrypt
.
Using , it is possible to scramble (the binary representaof
tion of ) yielding . Applying the same mapping rule
the encryption process to yields a new vector .
the document is assumed authentic. Else, it is
If
assumed that one or more characters have been altered. A block
diagram of the authentication test process is given in Fig. 6.
If an attacker changes one or more characters in the docuand
are two completely different
ment such that
sequences (quasi-orthogonal) with very high probability, failing
the authentication test. A practical example of the proposed
system is given in Section VII.
Although OCR has been included in the detection process
assuming that the document has been printed and scanned,

1282

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 10, NO. 7, NOVEMBER 2008

Fig. 7. Effect of skewness dependent on the input luminance.

8 M

jj

is the filter order,


is the cutoff frequency, and
is the Euclidean distance from point
to the
origin (center) of the frequency spectrum. Although different
filters could be used, for this model the filter order and cut-off
frequency
which yield the best approximation of the frequency response of the process are determined experimentally
through curve fitting. In the tests, these parameters are given by
and
for the devices used.
Using the noise, gain and blurring filter parameters described
above, a character or symbol distorted with the proposed PS
model is perceptually similar to an actual printed and scanned
character, as illustrated in [3].
Regarging the perceptual impact of the method, if the modulation intensity exceeds a given perceptual threshold, it becomes
less difficult for a human viewer to notice the modifications.
Nevertheless, unlike regular images (like natural photos), where
the meaning of the image depends on the values of the pixels,
in text documents the meaning is given by the shape of the
characters, such that letters can be recognized and words interpreted. In this sense, characters could be of any color and as
long as they are readable, the information content of the modified document will be exactly the same as the original all-black
character document. The relevance of the perceptual constraint
is that it refrains the reader from being bothered or annoyed
while reading, helping to ensure that the readers attention is
not drawn to the fact that the characters are modified.

where

Fig. 6. Decryption block diagram. Block s/b represents string-to-binary con^ from c bits to bits. The
version. Block represents a mapping of c
symbol represents the exclusive or (XOR) logical operation.

TABLE I
COMBINATIONS OF PRINTERS AND SCANNERS USED IN THE EXPERIMENTS

the proposed authentication protocol can be applied to digital


documents. It does not require the use of appended files and it
is robust to format conversions, such a .pdf to .ps, for example.
Hence, unlike a digital signature, which protects the binary
codes of the documents, the system proposed here protects the
visual content or the meaning of the document.
VII. EXPERIMENTS
The goal of this section is to validate experimentally the
analyses of Sections III and IV and to illustrate that higher
order moments can be used to detect a luminance change in
printed symbols.
During the experiments, the noise and the distortion parameters of the PS channel vary depending on the printing and scanning devices used. The experiments are conducted with different
combinations of printers and scanners, according to the legend
in Table I. The printing and scanning resolutions were set to
300 dots/inch and pixels/inch, respectively. Typical values for
the parameters in (1) are
.
As discussed in [3], comparing the frequency responses of
an original digital image and its PS version after several experiis well represented by a traditional
ments, the response
Butterworth low-pass filter described by
(18)

A. Experiment 1
The effect of a halftone skewness level that depends on the
input luminance is illustrated in Fig. 7, where two curves are
presented. The black curve (Theoretical) represents the theoretical skewness presented in (7). The gray curve (Bayer) represents the skewness of a halftone block (before PS) generated
using the Bayer dithering matrix [18]. Similar experiments are
presented regarding the kurtosis, as shown in Fig. 8. These figures illustrate that the analyses of Section III are in accordance
with the results obtained from a practical halftone matrix.
B. Experiment 2
This experiment illustrates the validity of the channel model
described in Section II and the expected values of the higher
order moments as a function of the input luminance, determined
analytically in Section IV. The effect of a printed and scanned

BORGES et al.: DOCUMENT IMAGE PROCESSING FOR PAPER SIDE COMMUNICATIONS

1283

Fig. 8. Effect of kurtosis dependent on the input luminance.


Fig. 11. Effect of kurtosis dependent on the input luminance, after PS.

and
are presented in Figs. 12(b) and (c), respectively, illustrating a change in the shape of the distribution.
C. Experiment 3

Fig. 9. Effect of variance dependent on the input luminance, after PS.

Fig. 10. Effect of skewness dependent on the input luminance, after PS.

variance level that depends on the input luminance is illustrated


in Fig. 9, where two curves are presented. The black curve
(Theoretical) represents the theoretical variance determined
in (10). The gray curve (Experimental) represents the variance of printed and scanned blocks, originally of size 32 32.
Similar experiments are presented regarding the skewness and
the kurtosis determined in (13) and (16), as shown in Figs. 10
and 11, respectively. The Experimental curve in these figures
corresponds to the averaging of the results obtained for the nine
of PS devices.
combinations
Fig. 12(a) shows the histogram of a PS block generated from
. Similarly, histograms for
a constant luminance value

In this experiment a multilevel 2-D bar code is printed


with a sequence of 56000 symbols with four possible luminance levels (2 bits/symbol) drawn from the alphabet
. Optimum values for the alphabet depend on the PS devices used, as discussed in [1] and [2], where
the authors present a study on multilevel coding for the PS
channel assumed. The original size (prior to printing) of each
symbol is 8 8, corresponding to the size of one halftone block.
Table II shows the obtained bit error rates when performing
the detection using the four suggested metrics (average, variance, skewness, kurtosis) separately. This table also presents
the result of combining the metrics with the Bayes classifier,
illustrating a smaller error rate. Because the variance and the
kurtosis are symmetric around the middle of the luminance
range, they cannot be used alone as detection metrics.
In [2], for example, in a experiment comparable to the first
row (average detection) in Table II, the observed error rate was
. This rate is significantly improved when the multilevel coding with multistage decoding (MLC/MSD) proposed
in [2] is employed. Notice, however, that such more advanced
coding/decoding methods can also be applied to the higher order
statistics proposed in this work.
The size of the 8 8 cell used is comparable to traditional
nonmultilevel 2-D bar codes used in commercial applications.
In [2] the authors compare the rate (in bytes per square inch) of
well-know 2-D bar codes used in practice, such as Data Matrix, Aztec Code and QR code, to the rate of multilevel 2-D
bar codes. The multilevel codes show a superior performance
in comparison the nonmultilevel codes, when the modulation
levels of the multilevels codes are properly assigned, according
to the print-scan devices employed.
D. Experiment 4
This experiment implemented the text hardcopy watermarking system [5], [6], which embeds data by performing
modifications in the luminances of characters, respecting a
perceptual transparency requirement. A sequence of 15 180

1284

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 10, NO. 7, NOVEMBER 2008

TABLE III
EXPERIMENTAL ERROR RATES FOR TEXT WATERMARKING

Fig. 13. Decision boundary combining two metrics. Note the dispersion and
the offset caused by the distortions of the PS channel.

Fig. 12. Histograms with different shapes illustrating the change in the skewness and in the kurtosis of a PS region, according to the input luminance s .
. (b) Histogram for s
: . (c) Histogram for
(a) Histogram for s
s
: .

= 03

=0

= 0 65

TABLE II
EXPERIMENTAL ERROR RATES FOR 2-D BAR CODES

characters (as in
) is printed and scanned. The
font type tested was Arial, size 12. The luminances of the
with equal
characters were randomly modified to
probability, where 0.95 corresponds to bit 0 and 0.84 corresponds to bit 1. To determine to which class (bit 0 or bit 1)
each received character belongs to, the four metrics discussed
so far were tested. The resulting obtained error rates are given
in Table III. An example of employing the Bayes classifier to
combine two metrics (the average and the variance) is given
in Fig. 13. Note that decision boundary combining the metrics

Fig. 14. Decision boundary combining three metrics.

yields a reduced error rate, in comparison to the boundaries


based solely on the average or the variance. Similarly, Fig. 14
illustrates a surface separating the two classes in a 3-D space
formed by the average, the variance, and the skewness.
Notice that this small error rate is achieved using standard
consumer printing and scanning devices and regular paper.
Using professional equipments, it is reasonable to assume that
the error rate is close to zero in small documents (such as identification cards, passports, etc), specially if complete perceptual
transparency is not a requirement. This assumptions make
practical the authentication protocol proposed in Section VI.
E. Experiment 5
This section illustrates two applications for the authentication
protocol proposed in Section VI. One in printed form and one
in digital form.
1) ID Card: An identification card is authenticated using
TLM, as shown in Fig. 15. Notice that the intensity changes are
visible to illustrate the underlying process. A modified version

BORGES et al.: DOCUMENT IMAGE PROCESSING FOR PAPER SIDE COMMUNICATIONS

1285

which yields to the modulation string

Fig. 15. ID authenticated using the protocol proposed in Section VI. Notice the
modified character luminances.

as illustrated in the characters luminance above. If the D in


Document is modified to d:
.

.
is generated:
a completely different string

and the authentication is not verified.


VIII. CONCLUSION

Fig. 16. Scanned ID. The last digit in the birth date is modified from 8 to 9, as
indicated by the arrow.

of the card is generated, where the last digit in the Valid Until
field is modified. To reduce the probability of OCR and luminance detection errors, only numbers and upper and lower cases
characters of the alphabet are considered in this test.
For the nontampered document in Fig. 15, using a 8-bit ASCII
table to represent the characters [17], the following parameters
(discussed in Section VI) are obtained.
.

.
.

.
is composed of elements, corresponding to the number
of characters in the document. The document is authenticated
in .
by altering the luminance of each character in to
Notice that the characters luminances in the document in Fig. 15
.
are modified according to
After printing, the parameters are again obtained, based on
the tampered with printed document in Fig. 16. Because the last
are two completely
digit of the document is different, and
different sequences, failing the equality test shown in Fig. 6.
2) Paper Title: Some of the characters luminances on the title
of this paper are slightly modulated to a gray level. This can
be verified using any screen capture tool and common image
processing software. Increasing the luminance gain to a visible
level, the text becomes as follows.
Document Image Processing for Paper Side Communications
Using the 8-bit ASCII standard, the parameters for this
sample are
.

.
.

This paper reduces the detection error rate of printed symbols, in cases where the luminances of the symbols depend on
a message to be transmitted through the PS channel. It has been
observed that as a consequence of modifying the luminances,
the halftoning in the printing process also modifies the higher
order statistical moments of a symbol, such as the variance, the
skewness and the kurtosis. Therefore, in addition to the average
luminance and spectral metrics, these moments can also be used
to detect a received symbol. This is achieved without any modifications in the transmitting function. A PS channel model which
accounts for the halftoning noise is described. Analyses determining the relationship between the average luminance and the
higher order moments of a halftone image are presented, justifying the use of the new detection metrics. In addition to the
extended detection metrics, this paper also proposes an authentication protocol for printed and digital documents, where it is
possible to determine whether one or more characters have been
modified in a text document. The experiments illustrated: the
successful applicability of the new metrics; that a reduced error
rate is achieved when the metrics are combined according to the
Bayes classifier; two possible applications of the proposed authentication protocol, using as an example the title of this paper.
Notice that the contributions presented can be combined with
other methods, serving as a practical alternative for document
authentication.
APPENDIX A
This Appendix derives the result presented in (13).

(19)

1286

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 10, NO. 7, NOVEMBER 2008

Let

[2] R. Vllan, S. Voloshynovskiy, O. Koval, and T. Pun, Multilevel 2D bar


codes: Towards high capacity storage modules for multimedia security
and management, IEEE Trans. Inform. Forensics Security, vol. 1, no.
4, pp. 405420, Dec. 2006.
[3] P. Borges and J. Mayer, Text luminance modulation for hardcopy watermarking, Signal Process., vol. 87, pp. 17541771, 2007.
[4] A. K. Bhattacharjya and H. Ancin, Data embedding in text for a copier
system, in Proc. IEEE Int. Conf. Image Processing, 1999, vol. 2.
[5] R. Vllan, S. Voloshynovskiy, O. Koval, J. Vila, E. Topak, F. Deguillaume, Y. Rytsar, and T. Pun, Text data-hiding for digital and printed
documents: Theoretical and practical considerations, in Proc. SPIE,
Electron. Imag., 2006.
[6] P. V. Borges and J. Mayer, Document watermarking via character
luminance modulation, in IEEE Int. Conf. Acoustics, Speech, Signal
Processing, May 2006.
[7] R. A. Ulichney, Dithering with blue noise, Proc. IEEE, vol. 76, no.
1, 1988.
[8] R. A. Ulichney, Digital Halftoning 1988.
[9] K. Solanki, U. Madhow, B. S. Manjunath, and S. Chandrasekaran,
Modeling the print-scan process for resilient data hiding, in Proc.
SPIE, Electron. Imag., 2005.
[10] I. J. Cox, M. L. Miller, and J. A. Bloom, Digital Watermarking. San
Mateo, CA: Morgan Kaufmann, 2002.
[11] S. Voloshynovskiy, O. Koval, F. Deguillaume, and T. Pun, Visual
communications with side information via distributed printing channels: Extended multimedia and security perspectives, in Proc. PIE
Electron. Imag., San Jose, CA, Jan. 1822, 2004.
[12] M. Norris and E. H. B. Smith, Printer modeling for document
imaging, in Proc. Int. Conf. Imaging Science, Systems and Technology, 2004.
[13] T. Amano, A feature calibration method for watermarking of document images, in IEEE Proc. Fifth Int. Conf. Document Analysis and
Recognition, ICDAR99, Sept. 2022, 1999.
[14] D. Manolakis, V. Ingle, and S. Kogon, Statistical and Adaptive Signal
Processing. New York: McGraw-Hill, 2000.
[15] R. Duda, P. Hart, and D. Stork, Pattern Classification. New York:
Wiley-Interscience, 2000.
[16] S. Theodoridis and K. Koutroumbas, Pattern Recognition AUTHOR:
MORE INFO? 2006, AP.
[17] B. Sklar, Digital Communications. Englewood Cliffs, NJ: PrenticeHall, 2001.
[18] B. E. Bayer, An optimum method for two-level rendition of continuous tone pictures, in IEEE Int. Conf. Communications, 1973.
[19] J. Cannons and P. Moulin, Design and statistical analysis of a hashaided image watermarking system, IEEE Trans. Image Process., vol.
13, no. 10, pp. 13931408, Oct. 2004.
[20] X. Li and X. Xue, Fragile authentication watermark combined with
image feature and public key cryptography, in 7th Int. Conf. Signal
Processing, ICSP04. 2004, 31 Aug.4 Sep. 2004, vol. 3.
[21] S. Mori, C. Y. Suen, and K. Yamamoto, Historical review of OCR
research and development, Proc. IEEE, vol. 80, no. 7, pp. 10291058,
Jul. 1992.
[22] Y. Xu and G. Nagy, Prototype extraction and adaptive OCR, IEEE
Trans. Pattern Anal. Mach. Intell., vol. 21, no. 12, pp. 12801296, Dec.
1999.
[23] A. M. Alattar and O. M. Alattar, Watermarking electronic text documents containing justified paragraphs and irregular line spacing, Proc.
SPIE, vol. 5306, Jun. 2004.
[24] P. W. Wong and N. Memon, Secret and public key image watermarking schemes for image authentication and ownership verification,
IEEE Trans. Image Process., vol. 10, no. 10, Oct. 2001.
[25] H. Yang and A. C. Kot, Text document authentication by integrating
inter character and word spaces watermarking, in Proc. IEEE Int.
Conf. on Multimedia and Expo, 2004.
[26] J. T. Brassil, S. Low, and N. F. Maxemchuk, Copyright protection for
the electronic distribution of text documents, Proc. IEEE, vol. 87, no.
7, pp. 11811196, Jul. 1999.
[27] D. Huang and H. Yan, Interword distance changes represented by sine
waves for watermarking text images, IEEE Trans. Circuits Syst. Video
Technol., vol. 11, no. 12, pp. 12371245, Dec. 2001.
[28] M. Wu and B. Liu, Data hiding in binary image for authentication and
annotation, IEEE Trans. Multimedia, vol. 6, Aug. 2004.
[29] M. Mese and P. P. Vaidyanathan, Recent advances in digital
halftoning and inverse halftoning methods, IEEE Trans. Circuits Syst.
I: Fund. Theory Applicat., vol. 49, no. 6, pp. 790805, Jun. 2002.
[30] Q. Mei, E. K. Wong, and N. Memon, Data hiding in binary text documents, in SPIE Proc. Security and Watermarking of Multimedia Contents III, San Jose, CA, Jan. 2001, vol. 2.
[31] M. Jiang, E. K. Wong, N. Memon, and X. Wu, Steganalysis of degraded document images, in IEEE Workshop on Multimedia Signal
Processing, Oct. 2005.
[32] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation
Theory. Englewood Cliffs, NJ: Prentice-Hall, 1993, vol. I.

(20)
Recalling that is uncorrelated noise and
with probabilities
yields

(21)

(22)
Therefore, (19) can be written as
(23)
where

is given by (20).
APPENDIX B

This Appendix derives the result presented in (16).

(24)
Let
(25)
Recalling that is uncorrelated noise and
with probabilities
yields

(26)
Therefore, (24) can be written as
(27)
REFERENCES
[1] N. D. Quintela and F. Prez-Gonzlez, Visible encryption: Using
paper as a secure channel., in Proc. SPIE, 2003.

BORGES et al.: DOCUMENT IMAGE PROCESSING FOR PAPER SIDE COMMUNICATIONS

[33] R. Villan, S. Voloshynovskiy, O. Koval, F. Deguillaume, and T. Pun,


Tamper-proofing of electronic and printed text documents via robust
hashing and data-hiding, in Proc. SPIE-IST Electronic Imag. 2007,
Sec., Steganography, and Watermarking of Multimedia Contents IX,
San Jose, CA, 2007.
Paulo Vinicius Koerich Borges (S04) received the
B.E. degree in electrical engineering and the M.Sc.
degree in communications and signal processing
from the Federal University of Santa Catarina
(UFSC), Brazil, in 2002 and 2004, respectively.
In 2001, he was a Visiting Research Student at the
University of Manchester, U.K. In 2007, he received
the Ph.D. degree from Queen Mary, University of
London (QMUL) and UFSC, on the subject of digital
watermarking and data hiding.
He is currently a Researcher at QMUL, working on
EU IST MESH Project on the subject of video event detection. His research interests are video event detection/classification, digital watermarking, and hardcopy document authentication, besides general signal processing and pattern
recognition.

Joceli Mayer (M99) graduated in electrical engineering from the Universidade Federal de Santa
Catarina (UFSC), Brazil, in 1998, received the Masters in electrical engineering degree from UFSC in
1991, received the Masters in computer engineering
degree from the University of CaliforniaSanta
Cruz (UCSC) in 1998 and the Ph.D. degree in 1999
from UCSC.
Currently, he is an Associate Professor of Electrical Engineering at UFSC since 1993 and teaches
at the undergraduate and graduate programs. He has
published over 70 articles in conferences and periodicals. He is coordinating
projects on super-resolution, speech compression, image watermarking and
hardcopy document authentication with support of the industry and government
agencies including the FINEP, CNPq, Hewlett Packard and Intelbras.

1287

Ebroul Izquierdo (SM02) received the Dr.Rer.Nat.


(Ph.D.) degree from the Humboldt University, Berlin,
Germany, in 1993. for his thesis on the numerical approximation of algebraic-differential equations.
He is a Professor (chair) of multimedia and
computer vision and head of the Multimedia and
Vision Group at Queen Mary, University of London,
U.K. From 1990 to 1992 he was a teaching assistant
at the department of applied mathematics, Technical
University Berlin. From 1993 to 1997 he was with
the Heinrich-Hertz Institute for Communication
Technology, Berlin, Germany, as Associated Researcher. From 1998 to 1999 he
was with the Department of Electronic Systems Engineering at the University
of Essex, U.K., as a Senior Research Officer. Since 2000, he has been with the
Electronic Engineering Department, Queen Mary, University of London.
Prof. Izquierdo is an Associate Editor of the IEEE TRANSACTIONS
ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (TCSVT) and the
EURASIP Journal on Image and Video Processing. He has served as guest
editor of three special issues of the IEEE TCSVT, a special issue of the journal
Signal Processing: Image Communication and a special issue of the EURASIP
Journal on Applied Signal Processing. He has published over 300 technical
papers including chapters in books.
Prof. Izquierdo is a Chartered Engineer, a Fellow of the The Institution of
Engineering and Technology (IET), chairman of the Executive Group of the
IET Visual Engineering Professional Network, a senior member of the IEEE, a
member of the British Machine Vision Association and a member of the steering
board of the Networked Audiovisual Media technology platform of the European Union. He is member of the programme committee of the IEEE conference on Information Visualization, the international program committee of
EURASIP&IEEE Conference on Video Processing and Multimedia Communication and the European Workshop on Image Analysis for Multimedia Interactive Services. Prof. Izquierdo has served as session chair and organiser of invited
sessions at several conferences. He coordinated the EU IST project BUSMAN
on video annotation and retrieval. He is a main contributor to the IST integrated
projects aceMedia and MESH on the convergence of knowledge, semantics and
content for user-centred intelligent media services. he coordinates the European
project Cost292 and the FP6 network of excellence on semantic inference for
automatic annotation and retrieval of multimedia content, K-Space.

You might also like