Professional Documents
Culture Documents
WM in Camara Phone
WM in Camara Phone
> +
+
=
127 ) , ( 3 ) 127 ) , ( (
127 ) , ( 3 )) ) 127 / ) , ( ( 1 (
)) , ( (
2 / 1
0
2
y x bg for y x bg
y x bg for y x bg T
y x bg f
(3)
115 . 0 0001 . 0 ) , ( )) , ( ( + = y x bg y x bg (4)
W y H x for y x bg y x bg < < = 0 , 0 01 . 0 ) , ( )) , ( ( , (5)
where H and W are the height and width of the image, respectively. bg(x, y) is the
average background luminance and mg(x, y) is the maximum weighted average of
luminance differences around the pixel at (x, y).
{ } ) , ( max ) , (
4 , 3 , 2 , 1
y x grad y x mg
k
k =
= (6)
= =
+ + =
5
1
5
1
0 , 0 ), , ( ) 3 , 3 (
16
1
) , (
i j
k k
W y H x for j i G j y i x p y x grad (7)
= =
+ + =
5
1
5
1
) , ( ) 3 , 3 (
32
1
) , (
i j
j i B j y i x p y x bg (8)
G
1
=
0 0 0 0 0
1 3 8 3 1
0 0 0 0 0
1 3 8 3 1
0 0 0 0 0 G
2
=
0 0 1 0 0
0 8 3 0 0
1 3 0 3 1
0 0 3 8 0
0 0 1 0 0 G
3
=
0 0 1 0 0
0 0 3 8 0
1 3 0 3 1
0 8 3 0 0
0 0 1 0 0 G
4
=
0 1 0 1 0
0 3 0 3 0
0 8 0 8 0
0 3 0 3 0
0 1 0 1 0 B=
1 1 1 1 1
1 2 2 2 1
1 2 0 2 1
1 2 2 2 1
1 1 1 1 1 (9)
14
The function f
1
(x, y) models the spatial masking effect, which means that values
near edges in an image can be changed much more that the values near constant
intensities. The function f
2
(x, y) defines the visibility threshold due to background
luminance and the values T
0
, and are chosen as to be 17, 3/128 and
respectively. [4]
To test how well the JDN calculations work, Chou and Li developed a so-called
PSPNR (Peak Signal to Perceptible Noise Ratio) value. Often, the PSNR (Peak
Signal to Noise Ratio) value is calculated with
MSE
g o l PSNR
255
20
10
= , (10)
where the MSE is the mean squared error. Unfortunately it cannot tell accurately the
perceptual quality of the image and therefore some other methods are needed [4].
The PSPNR value measures the perceptible distortion energy and it is defined as
[ ] { }
W y H x for
y x JND y x p y x p if
y x JND y x p y x p if
y x
y x y x JND y x p y x p E
g o l PSPNR
fb
fb
fb
< <
>
=
=
0 , 0
) , ( ) , ( ) , ( , 0
) , ( ) , ( ) , ( , 1
) , (
) , ( ) , ( ) , ( ) , (
255
20
2
10
, (11)
where the ) , ( y x p denotes the reconstructed pixel at (x, y) and ) , ( y x JND
fb
the
original JND profile. [4]
2.3. Robustness
Watermark that will not endure even the most common data processing such as
scaling is useless. Even the fragile watermarks that are meant to get broken will not
ideally break by accident. This chapter presents some of the attacks expected,
including JPEG compression and geometrical attacks. The print-scan process and
picture taking with a mobile phone are viewed separately.
2.3.1 Robust and fragile watermarks
Most of the watermarks are designed so that they resist many kinds of attacks meant
to destroy the embedded information, but some watermarks are made to get broken.
These watermarks are called fragile watermarks and their purpose is to detect
tampering of the image. If the image is purposefully changed, the watermark will get
destroyed.
Robust watermarks are required to survive through almost all attacks, but
designing such watermarks is very difficult. It is practically impossible to design a
15
watermarking system that would resist all kinds of attacks. With a careful analysis of
system requirements, however, it is possible to design a watermarking system that is
robust against most probable attacks in the required environment. One way to deal
with different kinds of attacks is to use multiple watermarking. That is, few
watermarks are embedded in the image each of which is designed to recover from
different kinds of attacks. Multiple watermarking is explained in more detail in
section 3.3.
2.3.2 Printed images
Most of the time, when taking about the print-scan process, only the scanning
process is considered and the printing process is neglected. However, the printing
process also inflicts attacks to the watermarking process. It is generally
acknowledged that the printing quality varies between different printers. Perry et al.
[5] experimented with different printers and concluded that the end products of
different printers vary across different manufacturers and even between identical
models from the same manufacturer.
Paper quality obviously affects the quality of the printed image, and Perry et al.
reminded that also ink density has an effect on the result [5]. These results show that
the printing process should not be neglected while reading watermarks from printed
images but carefully considered.
2.3.3 Print-scan
Print-scan process has many similarities with photographing and therefore it has been
used as a prerequisite when defining a watermarking system for camera phones. In
the process, watermarked image is first printed and then scanned, and as a
consequence, the watermark should be robust against various kinds of attacks. Figure
3 shows the user interface of the Epson 15000 scanner, where the user defines the
scanning area with a dash line quadrilateral. A large portion of the background of the
image is also being cropped along, and the watermark is no longer in the centre of
the scanned image. The watermark should endure through geometrical
transformations, such as rotation, scaling and translation, but it should also be
readable after DA/AD transform and noise addition and it should not get broken by
slight cropping of edges. Some research has been done recently on the properties of
the print-scan process, but the problem is complex, because the distortions during the
print-scan process are printer/scanner-dependent and time-variant even for the same
printer/scanner [6, 7]. While trying to find properties that are invariant to the print-
scan process, Solanki et al. [7] studied the print-scanproperties of the discrete
Fourier transform (DFT) magnitudes and concluded that:
1. The low and mid frequency coefficients are preserved much better than the
high frequency ones.
2. In the low and mid frequency bands, the coefficients with low magnitudes
see a much higher noise than their neighbours with high magnitudes.
16
Figure 3. The user interface of the Epson GT-15000 scanner and scanning area
selected by a user.
3. Coefficients with higher magnitudes see gain of roughly unity.
4. Slight modifications to the selected high magnitude low frequency
coefficients do not cause significant perceptual distortion to the image.
These properties were further studied by He and Sun [6], who introduced three more
properties including:
5. Most textures can be preserved, or, most relationships between DFT
coefficients are preserved though individual DFT magnitude may vary.
6. The dynamic range of intensity values is reduced, that is, the original range
between 0-255 becomes 70-250 after the print-scan process.
7. The distribution of pixel values after the print-scan process look roughly like
a spindle as in Figure 4.
Figure 4. The intensity distribution of the print-scan process. X-axis represents the
original intensity while the Y-axis represents the print-scanned intensity.
These results give some guidelines to design a working print-scan robust
watermarking system.
scanned background
image
17
2.3.4 Mobile phone with a camera
While the print-scan process is clearly a two dimensional problem, taking a picture
with a camera phone, that is, the print-cam process, is a three dimensional one. All
attacks that occur in the print-scan process will also occur in the print-cam process.
This is not by any means the end of the story, but photographing with a camera
phone introduces an abundance of attacks to watermarking systems.
Some of the attacks explained here are due to the mobile phone camera properties
and some are interlinked with the camera lens. The camera phone itself presents
some technological constraints that need to be considered. One of the biggest
problems is the low processing power of the camera phones, which sets new
requirements for the watermarking system. The application must be lightweight and
its memory consumption must not exceed certain limits if the watermark processing
is done in the phone. The watermarking system should also be robust against JPEG-
compression because in most of the camera phones the captured image is
automatically compressed before saving. The JPEG-compression is explained in
more detail in section 2.3.6.
The cameras in mobile phones are not of high quality, and, although the qualities
approach those in digital cameras, they are still far behind. At present, the best
cameras in the mobile phones are of resolution two megapixels or more but camera
phones like that are still rare. It must be remembered that the amount of megapixels
is not the whole truth, but also the quality of optics has a huge impact on the quality
of an photographed image.
Even high quality optics will not entirely save the image from pincushion and
barrel distortions shown in Figure 5. In barrel distortion, straight lines in real world
bow outward in images, whereas in the pincushion distortion straight lines bow
inward. In both distortions, the amount of distortion is bigger close to image edges.
Fortunately, this type of distortion can be corrected easily because in every camera
the properties of the lens stay the same, and therefore the parameters that define the
amount of distortion can be determined beforehand for each camera lens.
Figure 5. Chess patterned reference image a) original image b) barrel
distorted image c) pincushion distorted image.
While correcting barrel distortions, all that is needed to know are the properties of
the camera lens. These properties can be found out by taking one or more pictures
from a reference image and analysing the pictures. A reference image can be for
example a chess board image, where black and white squares alternate as in Figure 5.
18
The properties of the lens stay the same in every picture taken, and after the
properties have been once found, the barrel distortion can always be inverted.
In this work, separate software is used for correcting barrel distortions. Camera
Calibration Toolbox for Generic Lenses is a Matlab toolbox made by Kannala which
is freely available in the Internet [8]. The toolbox is based on the generic camera
model and enables correction of barrel distortions as well as several other corrections
[9]. The calibration of the camera was done by using a calibration cube shown in
Figure 6.
Figure 6. Reference cube for the Calibration Toolbox for Generic Lenses.
Other kinds of distortions that should be corrected are the effects of three
dimensional world, that is, perspective distortions. It is practically impossible to set
the camera so that it is entirely perpendicular to the image and therefore the picture
taken will be slanted.
2.3.5 Geometrical attacks
Robustness against geometrical attacks is necessary in designing a print-scan or
print-cam robust watermarking system. In this research, mostly rotation, translation
and scaling are studied, but also barrel distortion and perspective transformations are
paid attention to.
In Figure 7, there is as an example photograph of a watermarked image that has
been taken with a Nokia N90 camera phone. As seen, there is a visible barrel
distortion in the image and also the perspective has somewhat changed: the right side
of the image is slightly narrower than the left side. These distortions make the
reading of the watermark difficult, and without a proper watermarking technology all
the information embedded in the image could be lost.
The previously proposed methods for reading the watermark from distorted images
can be divided roughly in two main categories. The first one is to find out the
geometrical transformations that the image has gone through and then apply an
inverse transform [10]. The other way is to embed the watermark in a transformation
invariant domain, such as the Fourier-Mellin domain [11].
While taking a picture with a camera, it is practically impossible to keep the
camera perfectly straight and perpendicular to the object as shown in Figure 8.
Therefore some perspective transformations will happen. A perspective transform
19
Figure 7. An image taken with a N90 camera phone where some distortions
have occurred.
is a result of projecting a three dimensional scene on the two dimensional
image plane. Usually perspective transformation is well approximated by an affine
transformation and is equivalent to the composed effects of translation, rotation,
scaling and shear. Here, homogeneous coordinates are utilized to define the
transformation matrixes because all affine transformations can be represented as
matrix multiplications in homogeneous coordinates [12]. Homogeneous coordinates
are explained in more detail in Appendix 1.
COP
Figure 8. When the camera is not perpendicular to the object perspective,
transforms will happen.
Image plane
COP
Image plane
20
Translation is defined as an operation that displaces image points by a fixed
distance in a given direction. It is possible to describe translation of point P to point
P by specifying a displacement vector d by
, ' d P P + = (12)
for all points P on the object. The homogeneous coordinate forms of these points and
the vector are
,
1
= y
x
P
=
1
' ' y
x
P ,
=
0
y
x
d
, (13)
from where we can see that
.
,
y
x
y y
x x
+ =
+ =
(14)
This result can be represented as a matrix multiplication [12]:
TP P = , (15)
where
.
1 0 0
1 0
0 1
=
y
x
T
(16)
For scaling where a fixed point, that is, the point that is unchanged by the
transformation, is at the origin, the two corresponding equations to (12) are
,
,
y y
x x
y
x
=
=
(17)
where
x
and
y
are the scaling coefficients for x and y dimensions, respectively [12].
These equations can be combined as
SP P = , (18)
where the transformation matrix is
=
1 0 0
0 0
0 0
y
x
S
. (19)
21
The third basic transformation matrix, rotation, can be derived similarly. The fixed
point is again set at the origin and the equations for rotation are
,
,
+ =
=
os c y n si x y
n si y os c x x
(20)
where is the rotation angle counter clockwise about the origin. [12] The matrix
form is
RP P = , (21)
where
=
1 0 0
0
0
s co n si
n si os c
R . (22)
The motivation for using transformation matrices to represent transformations is
that transformations can be combined and inverted. This can be done by using matrix
multiplication [12]. For example, if T is translation matrix and R rotation matrix, we
get
RTa Ca b = = , (23)
where a is some vector, C is the new combined transformation matrix of R and T,
and b is the resulting translated and rotated vector. The order of the transformation
matrixes is important, because RT in not the same as TR. Here RT means that the
vector a has been translated first to some location and then rotated around origin. If
the equation had been TR, the vector a would have been first rotated and then the
rotated vector would have been translated.
2.3.6 JPEG transform
In many kinds of image processing applications, compression algorithms play an
important role. Reducing the file size of an image is necessary for storage and
transmission. Especially in mobile phone environment, where the space is scarce,
heavy compression is used. One of the most well known and widely used algorithms
is JPEG (Joint Photographic Experts Group), which is usually defined as a lossy
compression algorithm. This means that some of the information the image contains
is lost during compression. Losses are not usually noticeable with human eyes but
affect on the watermark extraction quality.
Most of the time when JPEG is being talked about the image compression standard
is meant. Actually the abbreviation JPEG means a joint ISO/CCITT (International
Organization for Standardization / International Telegraph and Telephone
Consultative Committee) committee group that has published several standards
including ISO/IEC IS 10918-1 | ITU-T Recommendation T.81, which is referred
22
often as JPEG. The standard was approved by ISO and CCITT, which is now called
ITU-T (International Telecommunication Union, Telecommunication
Standardization Sector), in 1994. [13]
JPEG is designed to be an efficient coding scheme for continuous tone (multilevel)
still images and it was intended to become the first international digital compression
standard for still images. It has four encoding modes under which various coding
algorithms are defined:
1. Sequential encoding
2. Progressive encoding
3. Lossless encoding
4. Hierarchical encoding
The implementations are not required to cover all of these, but the baseline system is
based on sequential coding. [14, 15]
Coding algorithms are mainly based on two dimensional DCT (discrete cosine
transforms) except the lossless encoding scheme that employs predictive processes.
In the lossless encoding, predictive coding and entropy coding are used. The
resulting compression ratio is only about 2:1 but because no information is lost, the
decoded image is an exact replica of the original image unlike in DCT coding
schemes where some of the information is always lost in quantization. [14]
In the DCT based encoding process, samples of an image are grouped into 8x8
blocks, each of which is transformed with DCT into a set of 64 coefficients. Each of
the coefficients is then quantized by a different uniform quantizer, where the
quantization step-sizes are based on a visibility threshold of 64-element quantization
matrices. The standard does not specify default values for quantization tables but lets
the applications specify values for their particular task. [14, 15]
After quantization entropy coding is applied, the DC coefficient is differentially
encoded by using previous quantized DC (Direct Current) coefficient to predict the
current DC coefficient. The 63 AC (Alternating Current) coefficients are transformed
into one dimensional sequence with a zigzag scan shown in Figure 9. The one
dimensional sequence is then entropy coded by using either Huffman or Arithmetic
coding. For the baseline system, only the Huffman coding is used. [15]
The JPEG transform is for the moment the most commonly used image
compression standard. However, the situation will change in the near future when the
new JPEG2000 (ISO 15444) standard gains ground. The JPEG2000 standard uses
wavelet transformations instead of Fourier domain, and it is claimed to be able to
compress images up to 200 times with no appreciable degradation in quality. [16]
23
Figure 9. A zigzag scan of quantized DCT coefficients.
24
3. WATERMARKING METHODS
Methods to embed watermarks are not limited to one domain but the watermark can
be embedded in almost any transformation domain available. There can even be
multiple watermarks which are then embedded in the image: some to the same
domain and some to different domains. Explaining all the different watermarking
methods proposed is practically impossible, and therefore, only some of the most
important methods concerning our application are explained. The first section
explains the basic watermarking scheme, the second section is actually a brief
literature survey of the previously proposed methods in different domains and the
third section is about the multiple watermarking.
3.1. Generic watermarking scheme
Watermark can be embedded in an image with many ways. Some researchers exploit
the properties of transform domains, others create transform domains of their own
with properties they need. Here I shall present some methods used in the different
domains focusing mostly on the blind print-scan attack resilient watermarking
methods. In blind watermarking methods, the original image is not needed in the
extraction process, whereas in the non-blind ones the original image is required.
Before embedding the watermark, the pixels of an image are usually divided into
luminance and chrominance components. It is possible to embed the watermark in
some colour information, but the most common way is to use luminance information.
[2]
The watermark itself is usually a pseudorandom noise signal consisting of the
integers {-1, 0, 1}, and the amplitude of the signal is low compared to the image
amplitude to prevent the watermark from being visible. The only constraints are that
the watermark signal should not correlate with the image content and the energy in
the pseudorandom signal should be uniformly distributed. The most straight-forward
way to embed a watermark is thus to add the pseudorandom signal with a suitable
gain factor to the luminance values of the pixels of an image. [2]
The basic watermark embedding process is illustrated in Figure 10, where the
watermarked image I
W
(x, y) is obtained by adding the pseudorandom sequence W(x,
y) to the original image I(x, y). The corresponding formula is
) , ( ) , ( ) , ( y x kW y x I y x I
W
+ = , (24)
where the pseudorandom sequence W(x, y) is multiplied by a small gain factor k. [2]
The previously embedded watermark can be detected by calculating the cross-
correlation
=
+ + =
1
0
1
0
,
) , ( * ) , ( ' ) , (
M
m
N
n
W I
j n i m W n m I j i R (25)
between the possibly watermarked image I
W
(x, y) and the complex conjugate of the
25
Figure 10. Generic watermark embedding procedure.
pseudorandom sequence W(x, y). If the result of the correlation exceeds some
predefined threshold, the watermark is detected.
The detector can make two kinds of errors. It may detect a watermark even if there
is none, an error known as the false positive, or the detector may not detect the
watermark even if there is one, an error called the false negative. Generally the false
positives are considered as a worse kind than the false negatives because if the
existing watermark is not found, the image can be checked again and again whereas
the false positives cannot be corrected but the watermark is assumed to be detected
even if it is not. [2]
By using the aforementioned method only one bit can be embedded. To increase
the payload, the image can for example be divided into several blocks or sub-images
and embed a bit of a string of information in each of these sub-images, as did Smith
and Comiskey [17]. Figure 11 [2] illustrates a similar method.
3.2. Domains
This section focuses on some of the most common watermarking domains. First
some methods embedding the watermark in spatial domain are dealt with, then, some
methods working on Fourier domain and wavelet domain are discussed in a similar
way. The last section is for the methods working on other domains, not mentioned
here.
26
Figure 11. Embedding watermark in blocks.
3.2.1 Spatial
Nowadays, a robust watermark is required to hold on through many kinds of attacks,
out of which geometrical attacks are considered the most difficult ones to recover
from. Kostopoulos et al. [18] tried to solve this problem by embedding multiple
cross-shaped patterns in the image. Their method seemed to improve the robustness
of a watermarked image against small amounts of rotation, translation and scaling,
but it was very vulnerable to noise and more sophisticated attacks.
Methods that rely on synchronization template like the method by Kostopoulos et
al. are not generally valued high - for it is easy for an attacker to remove the template
after which the watermark cannot be read. Template embedding methods are,
nevertheless, a very robust way to recover from geometric distortions, and while
designing value-adding services only unintentional attacks must be considered and
consequently the template removal attacks are not expected.
A large number of template embedding methods have been proposed and studied
for their great ability to recover from geometrical distortions. Kutter proposed [19] a
method for recovering from general geometric transformations. The idea of this
method was multiple embedding of the same watermark on shifted locations in the
image. The method can be seen as some sort of spread spectrum watermarking
except that he used an extra step that was used for predicting of the embedded
watermark and thus increase the performance of the detector. Watermarks could then
be predicted and correlated to determine the geometric transformation.
27
Deguillaume et al. [20], too, proposed a method based on repetition. In the method
a periodic pattern was embedded in an image in order to get a high number of peaks
after autocorrelation from the magnitude spectrum of Fourier transform. After
autocorrelation Hough or Radon transform could be applied to determine the regular
grid shaped template. With the orientation of the grid, it was possible to determine
the parameters of the general affine transform applied to the image.
3.2.2 Fourier domain based methods
Fourier transform is one of the most famous transforms used in signal processing. It
was named after Joseph Fourier, a French mathematician and physicist, who lived
during Napoleons time and was the first to suggest that any function of a variable
can be expanded in a series of sines of multiples of the variable. This was not true,
however, but the suggestion that it might be true, even partially, was a breakthrough.
[21]. The two dimensional discrete Fourier transform (2D-DFT) of f(i,k) is defined as
=
|
\
|
+
=
1
0
1
0
2
) , ( ) , (
N
i
M
k
N
k
n
M
i
m j
e k i f n m F
, (26)
where f(i,k) is an N-by-M array and j
2
= -1. The result F(m,n) is a complex signal,
with real and imaginary parts, out of which the magnitude and phase of the Fourier
transform can be determined. The magnitude and phase of a Fourier transform are
described respectively as
) , ( ) , ( ) , (
2
Im
2
Re
n m F n m F n m F + = , (27)
=
) , (
) , (
) , (
Re
Im 1
n m F
n m F
an t n m , (28)
where F
Re
is the real part of the transform and F
Im
is the imaginary part. [22]
The inverse transform is defined as
=
|
\
|
+
=
1
0
1
0
2
) , (
1
) , (
N
n
M
m
N
n
k
M
m
i j
e n m F
NM
k i f
. (29)
From these equations it can be seen that a rotation in spatial domain follows a
rotation in frequency domain, that is,
( )
os c v in s u in s v os c u F
os c y in s x in s y os c x f
+ +
+ +
,
) , (
(30)
and scaling in spatial domain corresponds to scaling in frequency domain, that is,
28
|
\
|
b
v
a
u
F
ab
by ax f ,
1
) , (
. (31)
Because of the properties of the Fourier domain presented above, the Fourier
transform is a powerful tool in watermarking. One of the most difficult and crucial
phases in detecting a watermark is the ability of the watermarking system to find the
watermark from the distorted image. When using Fourier domain, the problem
decreases significantly by noting that the magnitudes of the Fourier domain are
invariant to translations in spatial domain but not to rotation and scaling [22]. A
translation in the spatial domain is a phase shift in the frequency domain.
Some research has been done on the watermark templates in the frequency domain.
Pereira and Pun [10] embedded a template in the middle frequencies of the Fourier
domain magnitudes. The template they used did not contain any information but was
merely used for detecting the transformations the image had gone through. The
template consisted of approximately 14 points embedded in the magnitudes of
Fourier domain. The points were embedded uniformly along two lines at different
angles and by finding these lines from transformed and watermarked image, the
amount of rotation and scaling could be determined. The actual message was
embedded by using spread-spectrum methods into the Fourier domain between two
radii occupying a mid-frequency range. Their method has been criticised because the
template is easy to remove and thus the actual watermark is also lost. The method is
nevertheless quite robust against rotation, scaling and noise. [10]
Another similar method was proposed by Lee and Kim [23]. They embedded a
pseudorandom sequence into the middle frequencies of the input image as in Figure
12 and used cross-correlation at different radii to find the sequence, as illustrated in
Figure 12. Composing the circular template.
29
Figure 13. Since the sequence was pseudorandom, they could derive the amount of
rotation by finding the position of the cross-correlation peak. The drawback of this
method was that the rotation angle could only be calculated at the precision of 1 and
the amount of translation could not be found. On the other hand, the method is fairly
fast and relatively simple to use.
Figure 13. a) The spectrum of the watermarked image. b) Searching
of the template.
The magnitudes of the Fourier domain are generally used for their invariance to
translation in spatial domain. In some papers the idea of the invariance in Fourier
magnitude domain has been developed further and domains that are invariant to
translations, rotations and scalings have been researched. ORuanaidh and Pun used
Fourier-Mellin transform based invariants for watermarking [11]. There is one
drawback in this method, however: it works only against rotation, scaling and
translation distortions and not against aspect ratio changes or shear, for example. For
more information about watermarking in spatial and frequency domains, see paper
by Hartung and Kutter [24].
3.2.3 Wavelet domain based methods
From the Fourier transform, we know that most of the signals can be expressed as a
series of sines and cosines. The Fourier transform is an efficient way to analyze a
signal, but even if we get to know all the frequencies in a signal, we would not know
when the frequencies are present. The solution for this is to divide the signal into
small segments and analyse them separately. After that, we have some kind of
knowledge on when and where the frequencies appeared, whereas dividing the signal
we come up against Heisenbergs uncertainty principle, which states that it is not
possible to determine the exact frequency and the exact time of occurrence of
frequency in a signal simultaneously.
The problem seems to be unsolvable, but the wavelet transform offers a possible
solution. The wavelet transform employs a fully scalable modulated window, which
is shifted along the signal and the spectrum is calculated for every position. The
30
process is then repeated multiple times with a slightly different length of the window.
The final result is a collection of time-frequency representations with different
resolutions of the signal, the so-called multiresolution analysis. [25]
The discrete wavelet transform of f(m) is usually defined as
=
1
0
,
) ( ) ( ) , (
N
m
s
m f m s
. (32)
where, the * corresponds to a complex conjugation. The formula describes how the
function f(m) is decomposed into a set of basis functions ) (
,
x
s
, called the wavelets.
A set of wavelet basis functions, { ) (
,
x
s
}, can be generated by translating and
scaling the basis wavelet ) (x as
|
\
|
=
s
s
s
1
) (
,
(33)
where the s is a scale factor, is the translation factor and the single basic wavelet
) (m is the so-called mother wavelet. [25]
One of the first wavelet transforms and probably the most applied transform is the
Haar transform which was invented before the term wavelet. It is the simplest
possible wavelet and its wavelet function is of the form [25]
<
<
=
times other at
x
x
x
, 0
1 5 . 0 , 1
5 . 0 0 , 1
) ( , (34)
Wavelet domain is neither translation nor rotation invariant, but it is often used
because of the many advantages it has compared to other domains. In Fourier domain,
the transform applies sinusoidal waves as basis functions and thus the Fourier
transform is only localized in frequency. By contrast, wavelets are described as
waves with a limited duration and therefore are localized in both time and frequency.
This space-frequency representation is good at localizing image features, such as
edges and textured areas which might be neglected while working in the Fourier
domain. [22, 26]
Another main advantage is wavelet domains superior HVS modelling capabilities
compared with other domains. A reason for that are the similarities of the wavelet
transforms to the multiple channel models of the HVS. The frequency decomposition
of the wavelet transform resembles the signal processing of the HVS, so that both of
them divide the image into frequency channels that respond to an explicit spatial
location, a limited band of frequencies and a limited range of orientations. [26]
Wavelet transform of an image is also usually fast to calculate. This is due to a low
linear complexity O(n) compared for example with DCT (Discrete Cosine
Transform), applied over an entire image, which has complexity of O(n*log n).
Transmitting of a transformed image is also fast due to the multi-resolution
31
representation of the image because hierarchical processing can be done in a
straightforward way. [26]
Normally, wavelet watermarking methods are categorized by the wavelet
coefficients in which the watermark is embedded and especially between
approximation coefficients which contain the low-frequency information and other
coefficients, that is, the detail sub-bands that represent the high-frequency
information in horizontal, vertical and diagonal orientation. These detail sub-bands
are shown in Figure 14. [26]
Figure 14. a) Wavelet coefficients calculated with Haar function in Matlab. b)
Structure of the wavelet coefficients in the image a).
Barni et al. [27] embedded a binary pseudorandom sequence in the DWT (Digital
Wavelet Transform) coefficients of the three largest detail sub-bands of the image,
that is, vertical detail (LH
1
), horizontal detail (HL
1
) and diagonal detail (HH
1
), by
using visual masking so that the watermark could be embedded with maximum
energy. The watermark was detected by using a correlation between the marked
wavelet transform and the watermarking sequence. The detection results obtained
were really good while dealing with image cropping, because the watermark energy
could be kept as high as possible for the similarity of DWT decomposition to the
models of HVS and therefore even small portions of the image were sufficient to
correctly guess the embedded code.
Watermarking in wavelet domain resembles watermarking in spatial domain.
Many of the basic techniques and methods used in spatial domain can be also
employed in the wavelet domain. Another aspect to be considered when designing a
watermarking system in wavelet domain is the upcoming JPEG2000 standard, which
works in wavelet domain. That is, however, only one of the reasons why the wavelet
domain appears to be so attractive right now.
3.2.4 Methods in other domains
Spatial, Fourier and wavelet transforms are not the only transformation domains that
have been used in the field of digital watermarking. Many other domains have been
32
researched and their qualities investigated and exploited. Most of the other domains,
however, are variations or extensions of well-known Fourier or wavelet domains.
Hadamard transform is an example of generalized class of Fourier transforms. The
difference between these two transforms is that the basis functions of the Hadamard
transform are variations of a square wave rather than sinusoid. The Hadamard
transform has only 1 and -1 as elements in its kernel matrix and the simplicity of the
Hadamard transform is a significant advantage in processing time over some other
transforms.
Quite a few researches have been published concerning Hadamard transform in
watermarking and one of them is a method proposed by Gilani and Skodras [28]. In
their technique, the watermark is embedded in the perceptually most significant
spectral component of an image. The image is first Haar wavelet transformed and
then the lowest frequency band is Hadamard transformed. The result is then zigzag
scanned and the watermark is embedded in those coefficients. The extraction method
is fairly similar but the zigzag scanned coefficients are cross-correlated with the
watermark generated by a secret key.
Some research has also been done in Gabor [29] and Fresnel [30] transform
domains, but the research in these domains has not evolved a great deal. The
methods developed are not robust enough against print-scan attacks or geometrical
distortions. Another promising domain is the fractional Fourier domain, which is the
generalization of the classical Fourier transform Error! Reference source not
found.. Not much watermarking research has been done on this domain because the
idea of it is similar to that of wavelet domain and so it remains to be seen if it is a
suitable domain for watermarking.
Lot of Discrete Cosine Transform (DCT) based watermarking algorithms have
also been proposed. Some of them use block based algorithms, where the image is
divided into blocks and the watermark is embedded in those blocks. These methods,
however, are not generally robust against geometric transformations and are not
examined here.
There exist various transform domains of which only some are studied with
watermarking. All of them have good or even superior qualities but also some side
effects. Therefore, if a transform domain is to be used, it must be selected carefully
and the properties of the environment where the watermark is used must be kept in
mind.
3.3. Multiple watermarking
Multiple watermarking means in short embedding more than one watermark in the
image. Lhetkangas [32] studied the problem of multiple watermarking in her Master
of Science thesis. She studied cases where there were multiple watermarks and
multiple users who wanted to embed information in digital images. To analyse
different multiple watermarking techniques, she developed a new classification
system for the multiple watermarking methods. The previous method classification
system was developed by Sheppard et al. [33]. They divided the multiple
watermarking methods into three classes: re-watermarking, segmented watermarking
and composite watermarking.
33
In re-watermarking, the multiple watermarks are embedded by adding them one by
one on top of each other. This method is fast and simple but it can also be used as an
attack in some circumstances. The lastly embedded watermark can destroy the
previously embedded watermark and thus one must be careful while choosing the
watermarking methods. Another drawback of this method is that every embedded
watermark decreases the quality of the watermarked image and consequently the
PSNR value also drops. [33]
Another way to embed multiple watermarks in an image is to divide the image into
segments and embed watermarks each in its own segment. This is called segmented
watermarking, and it does not degrade the image more than embedding only one
watermark. It has some limits, however, because when the amount of segments rise,
their size decreases and watermark embedding to smaller segments becomes harder.
[33]
A third way to use multiple watermarking is to use composite watermarking by
building a composite watermark from a collection of watermarks. The watermarks
can be for example pseudo random sequences that are combined and then embedded
in the image as usual. The composite watermark will be separable if the different
watermarks are orthogonal or, like in the case of pseudo random sequences,
uncorrelated. [33]
Lhetkangas motivated her work with value-adding watermarks in addition to
DRM (Digital Rights Management) problems and discussed about the multiple
watermarking hiding methods from various points of view. She, too, divides the
multiple watermarking methods to three classes, but this time the classes are the
basic algorithm, methods to divide watermarking space and multiple watermarking
hiding techniques. [32]
The basic algorithm is a watermarking algorithm that is applied to hide one
watermark once. The basic algorithms used in multiple watermarking form the basis
and set limits for performance. It is possible to take advantage of the properties
multiple basic algorithms in multiple watermarking. [32]
Instead of classifying re-watermarking and segmenting watermarking separately
[33] Lhetkangas [32]combines them under methods to divide watermarking space.
She claims that they define if the multiple watermarks are embedded in the content
over each other or in parallel with each other.
The third class of the Lhetkangas classification system is the multiple
watermarking hiding techniques. They define the order in which the watermarks are
embedded and who is embedding the watermarks. In some applications there might
be several users who want to embed watermarks to prove ownership and rights to use.
For example, the creator of the image may want to embed creator information in the
image, whereas the distributor of the content may wish to embed copyright
information in the image. Some information might be protected and therefore
everyone cannot be allowed to access the embedded information as is. [32]
Most commonly the multiple watermarking algorithms are applied to enhance the
robustness of a watermark, but the development of the digital world has brought new
application areas. In the digital world, the media should be playable on different
platforms and devices and with different programs. The multiple watermarking
techniques can be applied to help the adaptation of the content to the various
environments by embedding watermarks in the content, each of which can contain
information about the functionality of the content, settings required and programs
needed [32].
34
4. COMMERCIAL DEVELOPMENT
Although digital watermarking has been around only for a little while, some
commercial applications have been initiated, Digimarc being probably the most
famous one. Here some of the commercial applications in value-adding
watermarking are introduced.
4.1. Digimarc
Digimarc Corporation is based in Beaverton, Oregon, but it has international offices
also in London and Mexico. Digimarc is a developer of digital watermarking
solutions and it offers security and brand protection solutions to global corporations
and government entities. Although Digimarc has focused mainly on digital rights
management issues, it has launched a different kind of initiative based on
watermarking to enhance mobile computing and commerce. [34, 35]
The goal of the Digimarcs initiative is to provide a service for camera phone users
to navigate from printed materials to a URL for a website with one click. That is, the
printed material contains a watermark that can be read with a camera phone. The
phone then recognizes the image and sends it to Digimarcs registry to determine
what to do with it: whether to direct the user to some website or to an e-commerce
application. The registry contains information about the user, for example how he
wants to pay his purchases and so on. To be able to use the service, the user must
have a downloaded Digimarcs client to his phone. [35]
The Digimarc has acknowledged the problem that unlike barcodes an invisible
watermark is not apparent to the naked eye. This causes problems in how to let the
consumers know about the watermark in the materials. The Digimarcs solution to
this is, at least initially, to partner with an e-commerce or catalogue company. The
users of a catalogue are assumed to be comfortable in using a device to select
catalogue items. [35]
The Digimarcs initiative has roused interest at least in Japan where MediaGrid
has licensed Digimarcs technology. In July 25, 2006, Digimarc announced a launch
of a digital watermarking pilot in Japan in Amusement Caf Maid in Japan caf.
The idea of the pilot is to offer customers with a camera phone possibility to interact
with digitally-watermarked print materials. The materials may contain links to online
content such as a theme-oriented city guide or a mobile phone wallpaper featuring
favourite characters. The pilot was rolled out by MediaGrid and Success Corporation,
a leading developer of games and video games in Japan. [34]
4.2. CyberSquash
CyberSquash is developed by NTT (Nippon Telegraph and Telephone Corporation)
Cyber Solutions Laboratories. It is defined as an Internet Access Platform that makes
use of watermarking technologies. In this system, a watermark, indicating an URL
for a desired homepage, is embedded in a printed image, which can be read with a
35
web camera or a mobile phone with an i-appli digital camera. The image is then
processed and the user is directed to the specified homepage. [36]
There are two types of CyberSquash software that are used for reading the
watermarks: active-X version and i-appli version. The Active-X version is developed
to read watermarks with a Web camera on a PC and i-appli version is developed to
read watermarks with a mobile phone equipped with a digital camera. The i-appli
version works only in NTT DoCoMos i-mode mobile phones and it is created in
Java programming language. [36]
In CyberSquash the watermark is embedded in the image in four phases: first error
correction coding is applied and the received code is modulated by using Direct
Sequence Spread Spectrum (DS-SS) modulation. The modulated code is then
permuted with pseudorandom sequence to reduce the imbalance of robustness among
bits. In the third step, the modulated and interleaved code is embedded in the image
by applying two dimensional pattern modulation in small blocks, as illustrated in
Figure 15, where the patterns are two, two dimensional sine curves with 90
rotational symmetry. The actual embedding is done by multiplying the watermark
pattern with an embedding strength factor and superposing the original image on the
watermark pattern. Adaptive pattern superposition can be also employed to improve
the balance between the image quality and the robustness of the watermark. [37]
Figure 15. 2D Pattern modulation in CyberSquash application.
The method presented above is not robust against geometric distortions and
therefore the writers placed a frame around the watermarked image to recover
synchronization. The frame also works as an indicator showing that the watermark
has been embedded. After the image has been read with a camera, the frame is
recognized and the four corner points are located. Out of these locations parameters
of the affine transform and scale can be determined. The parameters determined are
more like approximations than exact values and thus the corrected image may
contain small geometric distortions. The embedded watermark is designed such that
it is robust against such small distortions. [37]
36
In the watermark detection, the scaled and geometrically corrected image is
filtered with a pre-processor to increase the robustness of the watermark. After
filtering, the image is divided into small individual blocks and the energy of the
frequencies corresponding to the two sine curves is calculated on each block. By
calculating differences between two energy levels the sign of the embedded sequence
is obtained. When the embedded sequence is determined, the sequence is de-
scrambled with pseudorandom permutation and demodulated. The last step is to put
the sequence through an error correction process. [37]
The CyberSquash trial was initiated in 2003 and was planned to be around for six
months. However, after the trial, the CyberSquash application has disappeared from
the news headlines and it seems that the development of it has stopped.
4.3. Bar codes
Bar codes are not really watermarks, but the application areas of the bar codes are so
similar to those of watermarking that it is necessary to introduce them. The first
section contains a short history of bar coding and some description about the
applications where they are used. The second section tells how the bar codes work.
4.3.1 Short history
Bar codes are usually thought as a rival of watermarking in the field of value-adding
services, because they can be used in similar applications. There are, however, many
applications where either one is clearly more suitable. For example, in catalogues
where space is scarce, watermarking is clearly a better solution than bar codes. On
the other hand, in advertising where the user needs to be informed separately about
the extra data included in the image, a bar code might be more suitable than
watermarking.
Bar codes were invented on 1949 when a young graduate student, Joseph
Woodland, draw idly some dots and dashes on the sand. He was trying to figure out
how to read automatically information about a product and he knew that Morse codes
were the key to solve the problem. While lying on the beach he finally understood
how it should be done and so the idea of bar codes was created. [38]
Joseph Woodland and his partner, Bernard Silver, received a patent on bar codes in
1952 (US Patent 2,612,994) but it was not a rapid commercial success. Although the
idea was ready for commercial world, the technologies that were needed in the bar
code scanners were expensive or yet to be found. It took fifteen years before the first
commercial use of bar codes and it was at mid seventies when the bar codes finally
came in to the stores. This was enabled by the invention and development of lasers
and integrated circuits which came affordable in the 1960s and made the bar code
scanners simple and profitable. [39]
One of the first standards created was the UPC (Universal Product Code), now
officially as EAN.UCC-12, (International-Uniform Code Council), which is still in
use in USA and Canada. In the early 1970s, US grocery industry was trying to find a
way to reduce costs. They reasoned that automating the grocery checkout process
37
could do this, and after two years effort they announced an UPC and UPC bar code
symbol on April 1, 1973. First item bought by using this system was a package of
Wrigley's gum sold in Marsh's Supermarket in Troy, Ohio on June 26, 1974. [38]
Nowadays, bar codes are used in multiple ways and even some programs have
been published for camera phones that read bar codes. The critics who favour
watermarking over bar codes claim that the bar codes are ugly and require extra
space. However, right now the bar codes can contain more information that
watermarks and they are more robust in mobile applications. Technology has surely
advanced from the dates when the first bubblegum packet was sold with bar codes.
4.3.2 Operation
Bar codes are often described as a machine-readable representation of information
printed on some surface. The traditional bar code consists of bars and spaces of
alternating diffuse reflectivity, usually black and white parallel stripes, as illustrated
in Figure 16. The bar codes in the figure were generated with a bar code online demo
[40]. The information in the bar codes is encoded to the bars and spaces along one
dimension, horizontal, and therefore the vertical height of the bar code has no
specific meaning. It only makes the reading of the bar code easier. [41]
Figure 16. Examples of UPC-A bar codes.
There are two main ways to encode information to bar codes. The first one is to
divide the piece of code into 1s and 0s and then paint 1s with black as bars and 0s
as spaces as in Figure 16, in which is an example of UPC-A codes. The second way
is to use width coding, that is, assign each bit to a bar or space and make that element
wide if the bit is 1 and narrow if the bit is 0. For example, bar code standard Code 39
is a width code. [41]
The encoded information can be read by various technologies. The most common
ones are cameras and lasers and although the technologies are different, the idea is
the same: when the scanner reads a bar code, it detects only reflections of light. The
black stripes will not reflect any light whereas the white stripes reflect most of the
light back. [41]
As the technologies evolved, it was realized that the traditional one dimensional
bar codes were not good enough. Data Matrix is a 2-dimansional bar code standard
consisting of black and white dots as in Figure 17. The Data Matrix code includes
four basic elements: two solid-line locators, two synchronization lines, data area and
a quiet zone. The data area contains, obviously, the encoded binary data, whereas the
quiet zone is the empty narrow area around the data matrix. The two solid-line
locators are solid perpendicular lines that indicate the data area boundaries and the
orientation of the data matrix. The two synchronization lines opposite the solid-line
locators indicate the sample modules. [42]
38
Figure 17. Data Matrix bar code encoding MediaTeam.
Bar codes have been used in practically any imaginative application. They have
been embedded into groceries, aeroplanes, cars, images and even into fashion and
tattoos. They have been used in monitoring movement, merchandise and tracking of
objects. Bar codes have been around for a long time, but their story is far from over.
4.4. Mobot
Mobot does not use watermarking technologies for offering value adding services,
but it is worth mentioning for the wide publicity it has received in USA. Mobot does
not require any kind of barcode, logo or special symbol, nor does it need any kind of
client software in the mobile device [35]. Instead of watermarks, Mobots solution is
based on image recovery, pattern recognition and image matching capabilities. This
enables Mobot to support all camera phones in the marked regardless of camera
accuracies. [43]
The user needs only to snap a photo with his/her camera phone of the interesting
ad and send it to Mobot server which then analyses the image and in turn sends the
user whatever the advertiser wants him/her to receive. The data user receives could
be for example a coupon, a giveaway or additional information about the product but
for consumers to actually receive the giveaways and offers from the advertisers, they
must first register with the company. All this is already in use in the Jane magazine,
a magazine for young women, which has launched promotion Jane talks back. [35]
4.5. Sanyo
A Japanese electronics company, Sanyo, too, has done some research on
watermarking. Takeuchi et al. from Sanyo Electric Co., Ltd. proposed a method to
read watermarks from printed images with a camera phone. The actual watermark is
embedded in their method with guided scrambling (GS) techniques. Unlike many
other watermarking methods that are tested with cameras, the method by Takeuchi et
al. compensates also the radial distortions. The coefficients of the correction model
are calculated by using a chessboard calibration pattern as a preliminary work. It was
assumed in the paper that the coefficients would not change between phones of the
same model and a database of the coefficients that could be referred by a product
name of the camera phone and focal length during photo acquisition was created.
The perspective distortion was compensated by calculating four corners of the image.
Unfortunately the more specific publications of this method are written in Japanese
and the information is not thus available for the international audience. [44]
39
5. PRINT-SCAN RESILIENT WATERMARKING
Printing and scanning of an image produces a set of distortions to the watermarked
image, as explained in section 2.3.3. In this chapter, a watermarking method is
proposed which is resilient to print-scan attacks. The block diagram of the
watermarking system is shown in Figure 18. The proposed method consists of three
parts: three separate watermarks. The first watermark is embedded in the frequency
domain to recover the image from rotation and scaling attacks. The second
watermark is embedded in spatial domain to recover from translation attack and the
third watermark is the multibit message which is embedded in the wavelet domain.
The last two sections of this chapter discusses about the experiments done and results
achieved, respectively, to validate the use of the method.
Every watermark embedded can be considered as an attack against formerly
embedded watermarks. The order in which the watermarks are embedded is thus
carefully chosen but it could be any other. Here, the multibit message is most fragile
of the three watermarks and therefore it is embedded last.
Figure 18. Block diagram of the proposed print-scan robust method.
5.1. Frequency domain template
Fourier domain has an advantage over other domains concerning watermarking but
this may also be its drawback: invariance to translations. This property is used here
for determining the amount of rotation and scale, because it is a lot easier to find the
watermark from Fourier transform domain when translation need not to be worried
about. The translation invariance forces us to find a different method for determining
the amount of translation, which is introduced in section 5.2. The next section
explains the embedding process, the second the extracting process.
5.1.1 Embedding
The template watermark is embedded in the magnitudes of the Fourier domain. The
first thing to do before embedding is to transform the luminance values of an image
to the Fourier domain which results in two Fourier images, real and imaginary parts.
In Fourier domain representation, low frequencies are located to the corners of the
transformed image. Before processing the image the low frequencies are moved to
the centre and then the magnitudes of the transform are calculated.
Embed template
in Fourier domain
Extract the
message
Invert rotation
and scale
Invert translation
Host image
Embed template
in spatial domain
Embed multibit
message
Taking a
picture
40
Template
After the magnitudes of the Fourier transform have been determined, the template is
embedded. To recover from rotation and scaling distortions, a template is embedded
in the magnitudes of the Fourier transform of the image in a somewhat similar
manner to the method by Lee and Kim [23], where a pseudorandom template
sequence of length 180 bits is embedded in the middle frequencies of magnitudes of
Fourier transform. The template of a pseudorandom sequence of 1s and 0s is
embedded in the middle frequencies of the image in a form of a sparse circle. This
process is illustrated in Figure 19. The points in the figure are exaggerated to make
them visible to the eye in printed material.
Figure 19. Embedding a pseudorandom sequence in the Fourier domain of an image.
The template is symmetrical around its origin because the magnitude component
of Fourier transform has the origin of symmetry. Every point on the circle is added to
the Fourier domain at an angle /20 from each other. The value of /20 is chosen for
convenience but it could be different. The values of the pseudorandom sequence that
differ from 0 form peaks to the Fourier domain when embedded. Therefore all the
points of the pseudorandom sequence are not visible but the 0s appear as gaps in the
circle. The strength at which the values are embedded varies with local mean and
standard deviation. This is because the embedding strength should clearly be larger
close to the low frequencies, where, in general, are the highest values of the Fourier
transform.
The decision to embed values in the middle frequencies is a compromise. The low
frequencies of a Fourier transform contain most of the energy in an image. Therefore
all the changes made to the low frequencies are highly visible in the image and
especially so because the watermark signal should be embedded very strongly so that
the energy of the image would not be overwhelming. On the other hand, the high
41
frequencies are very vulnerable to various kinds of attacks, for example to the JPEG
compression.
The result of the watermark embedding process can be seen in Figure 20 where a
small magnified piece of image is shown on the upper left corner of each of the
images. The magnified portion of the image shows more clearly the effect of
watermark embedding than the image itself. Some of the variation in the quality of
the image will be flattened during the printing process and thus it is possible to
embed the watermark more strongly that it would be when distributing the image in
digital form.
a) b)
Figure 20. a) Original image b) original image after embedding the watermark.
When embedding the watermark peaks in the magnitudes of a Fourier transform,
the watermark spreads over the entire image. This fact enables the watermark to be
robust against slight cropping. Cropping, on the other hand, inflicts noise in Fourier
domain, but if the template is embedded robustly enough, it will hold through it.
5.1.2 Extracting
Figure 21 shows the image after the print-scan process which has rotated and scaled
the image. To find the embedded template from the scanned image, the image is first
padded with zeros to its original geometry, a square. If the image is not padded with
zeros beforehand, the template circle would be stretched to an oval and the extraction
process would be more difficult.
The extraction of the template from the Fourier transform domain is mainly about
locating the peaks. From here on we can think of the image itself as noise and the
watermark as the information to be preserved. To find the hidden information, we
must first filter out the noise, that is, the image.
42
Figure 21. The watermarked image after print and scan process in which
some distortions have occurred.
Wiener filtering
The first thing to do after calculating the magnitudes of the Fourier transform is to
use Wiener filtering. The Wiener filtering removes some of the noise and helps in
finding the peaks. To find the peaks, the Fourier transform of the image is Wiener
filtered and the filtered transform image is subtracted from the distorted transform
image. Wiener filter minimizes the mean square error between an estimate
f
and the
original image f
( ) { }
2
2
f f E e = . (35)
Wiener filter is usually defined in the frequency domain with a formula
) , ( / ) , ( ) , (
) , ( *
) , (
2
v u P v u P v u H
v u H
v u G
f n
+
= , (36)
where P
f
(u,v) and P
n
(u,v) are the power spectra of the original image and noise
respectively. In the formula, P
n
(u,v)/P
f
(u,v) can be replaced with a constant, which
can be approximated roughly beforehand. [22]
Finding peaks by cross-correlation
The Wiener filtering helps in finding the peaks of the template from the noisy
environment and so the template can be found by using cross-correlation. To reduce
43
the noise, the Wiener filtered image of the Fourier transform magnitude domain is
further thresholded before applying cross-correlation. The thresholding is applied so
that a point is selected as one if the local mean around the point exceeds certain
predefined limit.
There are two things that we know about the template: the pseudorandom
sequence and that the template is shaped like a circle around origin. What we want to
know are the value of radius and the angle of rotation. The searching of the rotation
and scale factors is processed in two phases. In the first stage, a rough estimation of
the rotation angle and scale factor is determined and in the second, finer results are
achieved. Since the Fourier transform magnitudes are invariant to shifts in spatial
domain, it is enough to search for the template circle around the origin. To find the
circle, every possible radius must be searched. This could very easily lead to an
exhaustive search, but because the image is in digitized form, it is enough to search
first only integer valued radii and find out the exact value later in the second stage.
It is not needed to examine all the radii because at the low frequencies the noise
from the image is overwhelming. When calculating a cross-correlation of
pseudorandom sequence and a highly noisy signal, the result may show a high
correlation between the two signals even if there is none. Therefore some of the low
frequency radii can be discarded and the search area resembles an annulus between
two predefined frequencies f
1
and f
2
as in the paper by Pereira and Pun [10].
The detection of the template circle is performed as follows: first a radius is
selected and a one dimensional sequence corresponding to the radius in the Fourier
transform is extracted as in Figure 13 [23]. The sequence is cross-correlated with the
pseudorandom sequence by using a cross-covariance function which is related to
cross-correlation. The cross-covariance can be defined as a cross-correlation of mean
removed sequences
( )
<
|
\
|
|
\
|
+
=
=
=
0 ,
0 ,
1 1
) (
) (
*
1
0
1
0
* *
1
0
*
m m c
m y
N
y x
N
m n x
m c
yx
m N
n
N
i
i i
N
i
i
xy
, (37)
where x is a sequence of the image at some radius with length N and y is the
pseudorandom sequence interpolated to the length N. The maximum of the resulting
cross-covariance is saved to a vector. After the integer radii between frequencies f
1
and f
2
are examined, the maximum is selected from the vector containing maximums
of the cross-covariances, which is shown in Figure 22.
When a rough estimation of the radius of the template circle is found, the locations
of the template peaks are extracted. The peaks are found by examining the space at
wide 2 pixels around the previously found radius. Every point at this space is
examined by calculating a local mean about the point and deciding whether the point
is a peak high enough or not. The point is selected to be a peak if the value in that
point is 3 times bigger than the local mean and if the peak is a maximum on that area.
The difficulty of finding the peaks is obvious when looking at the magnitudes of
the Fourier transform as in Figure 23. The points are sharp and clear in the original
image but stretched and spread in the distorted image. The low frequencies are
strongly visible in the distorted image because some of the white scanned
background of the image is still in on calculations, whereas the original image
contains only the image and no scanned background.
44
Figure 22. The vector containing maximums of the cross-correlations.
Figure 23. Magnitudes of the Fourier transform of a) distorted image b) original
image.
Determining rotation and scaling parameters
After the peaks are found they are transformed into polar coordinates and divided
into /20 segments accordingly to their angle. Extra points can be discarded inside
these segments, because we know that the points should be at angle ~/20 from each
other. The resulting piece of signal is then cross-correlated with the embedded
pseudorandom signal and the maximum of the cross-correlation signal shows the
amount of rotation in a multiple of /20. For example, if a cross-correlation peak is
at a, the amount of rotation is roughly a* /20.
After the rotation has been found in a multiple of i/20, a more accurate value can
be determined. The angles of the peaks are subtracted with the original angles of the
45
embedded pseudorandom sequence and the value for rotation is thus received by
taking a median from the resulting values.
The scale factor is calculated by taking a trimmed mean from the radii of the peaks
and dividing the value with the original radii of the embedded pseudorandom
sequence. When the scaling and rotation parameters are found, the distortions can be
inverted with matrix operations explained in section 2.3.5. The image after rotation
and scaling is show in Figure 24.
Figure 24. The image after correction of rotation angle and scaling.
The algorithm for the extraction method is as follows:
1. Pad the image with zeros to its original geometry
2. Calculate the Fourier domain magnitudes
3. Apply Wiener filtering to remove noise
4. Find out a rough estimate of the rotation angle and scale factor
4.1. Apply thresholding
4.2. Search through integer radii between frequencies f
1
and f
2
with cross-
correlation between a radius and the pseudorandom sequence used in
embedding
4.3. Select the radius with the maximum cross-correlation value as the radius of
the circle
4.4. Calculate cross-correlation between the sequence at selected radius and
pseudorandom sequence
4.5. Select the location of the cross-correlation peak as a rough estimate of the
rotation angle
5. Refine the estimate
5.1. Calculate the exact locations of the template peaks with the rough estimates
5.2. Take a median from the angles of the peaks to get the angle of rotation
5.3. Take a trimmed mean from the radii of the peaks to get the amount of scale
46
5.2. Spatial domain template
Translation, that is, how far from its original place the watermark has shifted,
describes the location of the watermark in the image. Locating the watermark is not
so easy a task as it may appear at first sight. The watermark has probably been
rotated and scaled and those transforms must be inverted before the watermark is
located accurately. Also, there is no point in rummaging the whole image around
while searching the starting point of the watermark, but we should be able to restrict
the search somehow. Here a separate watermark has been embedded in spatial
domain to serve as a template. Locating the watermark is now faster than the
exhaustive search but sets its own impact to the imperceptibility of the watermark
load.
5.2.1 Embedding
The watermarking system should be robust against translation attack but the Fourier
transform magnitudes are invariant to translations. Therefore, another watermark is
needed to recover the image from translation attack.
Template
The template watermark for recovering from translation attack is embedded in the
spatial domain and the shape of it is shown in Figure 25. The template consists of
two similar parts, one for the horizontal translations and the other for the vertical
translations. A template part, either horizontal or vertical, is build with a small
pseudorandom sequence of size 127. The sequence is an m-sequence and the length
of the sequence is carefully chosen to be robust enough. A longer sequence would be
sensitive to small rotations but, on the other hand, a shorter sequence would be
difficult to find with cross-correlation.
Figure 25. The template embedded in the image in order to recover the
message watermark from a translation attack.
47
The m-sequence is repeated across part of the image, separately horizontally and
vertically. The horizontal pattern is formed as follows: The first line is embedded in a
suitable row so that the final pattern is similar to that in Figure 25. The second line is
an exact copy of the first line, but the third and fourth lines are skipped. This is done
because the pattern should not be visible to human eye in the final image and, if all
the lines are used, the periodical pattern of the template shows. The fifth line is
otherwise similar to the first line, but the m-sequence is shifted to the right by 2
pixels. Therefore the pattern seems to be oblique. The vertical pattern is similar to
the horizontal case, but the lines are columns instead of rows.
5.2.2 Extracting
The cross-correlation is used here to extract a watermark from spatial domain but the
problem is accuracy.
Quarter-pixel interpolation
Trying to find the template pattern from an image will not always end as expected,
because of all the geometrical distortions discussed in section 2.3.3. The main idea
here is to calculate cross-correlation with the embedded m-sequence and every other
row or column, shift the value with a suitable amount and add all the results together.
The reason for this is that the averaging of the cross-correlation results diminishes
noise.
From the two resulting sequences, a sequence for each rows and columns, it is
possible to see how much the image has been translated to each direction. The
problem with this is the fact that the amount of translation can be determined only
with accuracy 1 pixels, but in real world the image may be shifted only pixels,
for example. An error of pixels may very well destroy the reading of a watermark.
Therefore interpolation methods are applied for achieving better precision.
Here a quarter-pixel interpolation is applied by doing half-pixel interpolation twice
and a bilinear interpolation is applied to determine the values at the midpoint
between the pixels. Bilinear interpolation of a point is calculated by taking the
closest 2x2 neighbourhood of known pixel values surrounding the unknown pixel
value. The unknown pixel value is then calculated with a weighted average of the
four surrounding pixel values, as illustrated in the Figure 26. If all the distances from
the known pixel locations to the unknown pixel are equal, the interpolated value is
simply the sum of known pixel values divided by four.
Figure 26. Bilinear interpolation of an unknown pixel value.
(x
1
, y
2
)
(x
1
, y
1
)
(x
2
, y
2
)
(x
2
, y
1
)
(x, y)
48
Determining translation parameters
The determination of the translation parameters requires some processing power. The
rotation and scale corrected image is interpolated to a quadruple of its size by
determining the three values at equal distances between known pixels. Also, the
embedded m-sequence is interpolated to quadruple of its size for cross-correlation.
The extraction of the watermark is done in two phases, separately for horizontal and
vertical translations. Unfortunately, finding the translation parameters is not
straightforward, but the determined values must be suitably combined before the
values for the horizontal and vertical translations are found.
The cross-correlation with the embedded m-sequence is calculated for every other
row and the results of the cross-correlations are shifted with two and added up to a
vector, so that the possible cross-correlation peak is strengthened. The amount of
translation can be calculated from the location of the peak because we know the size
of the original image and the place where the translation template should be. In
Figure 27 there is a filtered plot of the resulting cross-correlation sequence.
Figure 27. Cross-correlation image of the horizontal template used for
determining the amount of the translation in the image.
In Figure 28, there is an example of the template extraction process where the x1,
x2, y1 and y2 are unknowns. The original size of the image was 512x512 and the
template watermark was embedded between lines 64 and 192 as explained in
preceding sections. After the print-scan process, the image is distorted and there are
borders around the image, resulting from an incorrect cropping while scanning the
image.
The image size is somewhat larger than the original image size due to the
background cropped along the image. In Figure 28 the rotation and scale have been
corrected from the distorted image and it can be assumed that the distorted image
now contains the watermarked image in its original size, but we just do not know its
exact location.
To be able to determine the location of the image, the unknowns in Figure 28
should be solved. The image is first interpolated to achieve an accuracy of 0.25
pixels. After interpolation, the amount of translation can be determined by
calculating a cross-correlation between the embedded and interpolated m-sequence
and every other line. There is no need to calculate cross-correlation with every line
49
Figure 28. The translation template after spatial shift.
because the template is always similar in interpolated image in eight succeeding lines.
In the original image, the template is similar in two succeeding lines. The cross-
correlations are added up to each other, by first shifting each line with one more than
the previous line and then summing it to previous results. The shifting is done so that
the possible peaks that declare the location of a template line in the cross-correlation
sequences are added up to each other and so strengthened. The same process is
repeated for the vertical dimension, resulting in two cross-correlation sequences and
two peaks.
The locations of the peaks will not tell the translation parameters right away
because of all the shifts and add ups. To find the translation parameters, we must
remove the effect of the image from the locations of the cross-correlation peaks. In
this example, it means that we take the locations of the peaks in the two cross-
correlation sequences and subtract the known location of the translation template in
the original image. That is, we add up 192, the location of the template on a first
template row, and 127, the length of the template, and the number of shifts before the
first template line, that is 32, because the cross-correlations are calculated with only
every other line. All these values are multiplied with four, because the image has
been interpolated to the quadruple of its size and subtracted from the locations of the
two peaks. This way, we find two values that include the translations.
The two values are not enough to find four unknowns, however. Therefore we
calculate two more values from the other end of the cross-correlation sequences. That
is, we take the length of the sequences and subtract the locations of the peaks from
those. These new values can then be processed as above and two more values are
received.
The four unknowns can be solved from the following equations.
50
1 2 2
1
2 2
1
1
2 1 2
1
1 2
1
2
4
3
2
1
y x val
y x val
y x val
y x val
+ =
+ =
+ =
+ =
(38)
where the val1-val4 are the values extracted above and x1, x2, y1 and y2 are the
unknown translation parameters. The final image after translation is shown in the
Figure 29. The image can now be extracted from the background and the actual
value-adding watermark can be read.
Figure 29. Watermarked and print-scanned image after correction of translation,
rotation and scaling.
A pseudo code representation of the extraction method is as follows:
1. Apply quarter-pixel interpolation to the image
2. Process horizontal part of the template
2.1. Calculate cross-correlation with every other row of the image
2.2. Shift every cross-correlation result with one more that the result of the
previous row so that the peaks are in a same line
2.3. Add all the results together
2.4. Find the maximum peak and calculate the distance from both ends of the
sequence
2.5. Remove the location of the template in the original image
3. Process vertical part of the template
3.1. Same as above but columns as rows
4. Solve the amount of translation from the received results
51
5.3. Wavelet domain multibit message
The wavelet domain is very sensitive to small geometrical distortions and all the
geometrical distortions must be removed before the message watermark can be read
from the wavelet domain. That is the reason why the template watermarks are used to
remove the effects of rotation, scale and translation from the image. This section
explains how the message watermark is embedded with a spread spectrum technique
and extracted with a method based on a thresholded correlation receiver.
5.3.1 Embedding
Error-correction coding
Before embedding the message watermark, the message was protected with an error-
correcting coding and (15, 7) BCH (Bose-Chaudhuri-Hocquenghem) codes were
chosen for their widespread use and simplicity. BCH codes are multilevel, cyclic,
variable-length codes applied to correct multiple random error patterns and the BCH
(15, 7) is especially able to correct two errors. The BCH codes are based on the idea
of adding parity bits to the code word to check if any changes have occurred. [45]
When calculating BCH codes Galois Fields are applied. Galois fields are called
also finite fields because they contain only finite number of elements. For example a
Galois field GF(q) is a field with q elements where q is a finite number. Every Galois
field has at least one primitive element a such that every field element except zero
can be expressed as a power of a. [45]
The BCH design rule requires that there are twice as many powers of a as the error
correction capacity t. If q=2
m
, where m is any integer 3, the elements of the field
can be represented by polynomials whose coefficients are elements of the field GF(2),
that is, 0 and 1. The block length of a such code is n=2
m
-1 and the error correction
capacity of the code is t<(2
m
-1)/2. [45]
The final code word consists of two parts: the message part and the remainder for
checking the message. The remainder part, that is, the part that contains parity bits, is
calculated with generator polynomial. Here, for BCH (15, 7), the generator
polynomial is x
8
+x
7
+x
6
+x
4
+1 which has been calculated with Matlab. The length of
the code word is then 15 and the length of the message is 7. The code word can be
generated by multiplying the corresponding polynomial of the message word with
generator polynomial.
Spread Spectrum Technique
The technique used here is similar to some extent to that by Keskinarkaus et al. in
[46]. The message bits that are protected with BCH code are embedded in the image
in wavelet domain. The image is decomposed to the first level sub-bands using Haar
wavelets and the watermark is embedded in the detail coefficients and while the
approximation coefficients are left unmodified. In the [46] the watermark was
embedded in approximation coefficients to gain better robustness, but here the
52
properties of the detail coefficients are employed, because they contain higher
imperceptibility properties. Especially the horizontal detail coefficients are used.
As in [46] the watermark is embedded with
= =
= + =
0 ), ( ) ( ) (
1 ), ( ) ( ) (
*
,
* *
,
*
,
* *
,
messagebit k m n Y n Y
messagebit k m n Y n Y
f l f l
f l f l
, (39)
where Y* is an image which has already been watermarked with the templates in
Fourier and spatial domain. ) (
*
,
n Y
f l
is the sub-band of Y* in the l
th
resolution level
and f
th
frequency orientation. ) (
* *
,
n Y
f l
is a new watermarked sub-band, where **
means that multiple watermarking has been applied. is a scaling coefficient to
control the embedding strength and m(k) is the m-sequence the length of which
controls the chip rate for spreading.
After the message has been embedded, the inverse wavelet transform is applied to
the image. The amount of distortion and noise presented by the multiple
watermarking is evaluated with eye and PSNR value. The results of the evaluation
are presented in the upcoming sections.
5.3.2 Extracting
All the geometrical distortions must be corrected before the watermark can be read
from the wavelet domain. After correcting the distortions the wavelet transform is
applied to the watermarked image and the detail coefficients are divided into small
segments of the same size as the m-sequence used for embedding the message. The
message watermark is extracted by calculating a mean removed cross-correlation
between the coefficient segment and the m-sequence. The result of the correlation is
analyzed and the message bit is chosen to be 1 if the correlation value is above a
certain threshold value and 0 otherwise.
5.4. Experiments and results
The image used in the experiments was the famous Lena image of size 512x512
pixels. The message was embedded in the image with spread spectrum techniques
and before embedding the message was error coded with (15,7) BCH code which has
error correction capability of 2 bits. After error correction the length of the message
was 135 bits.
After embedding the message and the template watermarks the image was JPEG
compressed with different compression ratios. Compression ratios examined here
were 100, 80, 60 and 40. The images used in the testing are included to the end of
this work as Appendix 3. It was noticed that different printers give different printing
qualities and thus for the experiments two printers were used. Most of the work was
done with Hewlet Packard ColorLaserJet 5500 DTN printer but one image was
printed with Hewlet Packard ColorLaserJet 4500 DN printer. The result of the latter
53
printer was significantly darker than the result with the 5500 DTN printer, as can be
seen from the next image. All the images were of physical size of 10.3cm x 10.3cm.
The scanner utilized was Epson GT-15000 scanner and every image was
scanned 50 times with 300dpi and then 50 times with 150dpi and saved to
uncompressed tiff-format. The image was rotated randomly between separate
scanning times and the rotation angle varied between -45 and 45 degrees. Also, the
scanned image area was changed, that is, how much white was left around the
scanned image.
a) b)
Figure 30. Lena image printed with different printers and then scanned. a) printed
with HP LaserJet 5500 DTN printer, b) printed with HP LaserJet 4500 DN printer.
The quality of the image was tested with PSNR and PSPNR values after
embedding the image and compressing it with JPEG. The results of the PSNR and
PSPNR calculations are collected to Table 1. From the large values in the table it is
possible to see that the embedding method works well and the quality of the images
stay fine through the embedding process. It must be noted, however, that the printing
process flattens somewhat the pixel values and the watermark will be even more
difficult to perceive. This works for the perceptibility of the watermark but against
the robustness of it.
Table 1. PSNR and PSPNR after compression
JPEG Compression Ratio PSNR PSPNR
100 39.5 57.6
80 37.3 49.9
60 36.5 47.7
40 35.9 46.0
The images were scanned with two different printing resolutions, but before
extracting the watermarks, the image areas scanned with printing resolution of
300dpi were scaled to 25%. This was done to reduce the computational complexity
and processing times. The results were gathered to two tables, Table 2 and Table 3.
The results of the reliability calculations of the extraction process are shown in
Table 2. The table shows the calculated success ratios, that is, percentage of times
when the message was extracted correctly. On the rows, there are different
54
compression ratio values for images before printing. The columns of the table show
the results with two different scanning resolutions.
Table 3 contains the average BER (Bit Error Ratio) values for different images. As
in the previous table, Table 3 is organized with the same way: on the rows, there are
different compression ratio values for images before printing. The columns of the
table show the results for two different scanning resolutions. The BER was
calculated from the received message before error correction by comparing the
received bits to the embedded bits. Thus the BER value here represents the quality of
the channel, that is, how many bit errors occur between printing and scanning. The
value in the bracket indicates the average BER when the extraction process was not a
success.
Table 2. Success ratio with different compression ratios and scanning settings
JPEG Compression
Ratio
300dpi 150dpi averaged
(uncompressed) 86.0% 90.0% 88.0%
100 94.0% 94.0% 94.0%
80 92.0% 92.0% 92.0%
60 78.0% 82.4% 80.2%
40 62.0% 58.0% 60.0%
100 (4500 DN) 14.0% 21.6% 17.8%
Table 3. Average BER with different compression ratios and scanning settings
JPEG Compression
Ratio
300dpi 150dpi averaged
(uncompressed) 6.5% (33.5%) 4.5% (32.7%) 5.5%
100 4.6% (50.2%) 3.6% (30.2%) 4.1%
80 3.5% (20.8%) 4.0% (31.3%) 3.8%
60 9.1% (33.2%) 6.8% (27.9%) 8.0%
40 14.9% (29.5%) 11.4% (23.0%) 13.2%
100 (4500 DN) 19.6% (21.6%) 21.5 % (22.9%) 20.6%
5.5. Discussion
After combining the results in the two tables it is clear that the method is fairly
robust against rotation, scale and translation attacks. The method is also robust
against some JPEG compression, but more work should be done to improve the
reliability of the method.
From the results in Table 2 it is possible to see that the success ratio decreases
when the quality of the JPEG compression decreases. This was expected, but it was
surprising how large the impact of selecting the printer actually is. By comparing the
55
first and last lines on Table 2, the importance of selecting the printer is visible. The
first line shows the results when a JPEG compressed image with a compression ratio
of 100 is printed with HP LaserJet 5500 DTN, whereas the last line shows the results
for a similar image printed with HP LaserJet 4500 DN. The scanner used was the
same for the both images all the time. The results show remarkable difference in the
watermark reading reliability and therefore a remark can be made that also the printer
quality should be considered when designing a watermarking system and not solely
the scanner quality.
In this method the value-adding watermark was embedded in the detail coefficients
of the wavelet transform. However, the degradation after JPEG compression of the
image hits most severely to details in an image and so the high frequency wavelet
detail coefficients may not work very well. Instead of embedding the watermark to
detail coefficients it would be interesting to study if embedding the watermark to
approximation coefficients would increase the reliability.
In the method of Keskinarkaus et al. [46], the watermark was embedded in the
approximation coefficients with fairly similar method as used here. The robustness
was only tested with degrees of -15 -15 in the method by Keskinarkaus et al., but in
previously described method, the rotation angle was varied between -45 and 45
degrees and the method was found robust against the rotations. The method of
Keskinarkaus et al. was however more robust against JPEG compression, where even
success ratios of 100% were reported with compression ratio of 80%.
The results in Table 3 show that JPEG compression affects strongly to the BER
when the compression ratio is equal or greater than 60%. With compression ratios
equal or greater that 60%, the amount of bit errors is still manageable. If the
compression ratio is smaller than 60%, the extraction of the watermark and
correcting of the bit errors gets difficult.
While comparing the method with previous methods it can be seen that the method
works very well. For example, the method works better in comparison with the
block-based method by He and Sun [6]. In their experiments, they got BER values of
15%, while in the previously described method the BER values were most of the
time under 5%. Not until the images were compressed beforehand with JPEG
compression ratios under 60, did the BER values got worse. One reason for high
differences between the methods is the fact that capacity is significantly higher in the
method of He and Sun, which weakens the robustness.
The exact BER values are included in the end of this work as graphs to Appendix
2. From the graphs is possible to see that in most of the cases the message is not
totally lost but been covered under noise. These kinds of messages could be saved
with a stronger error correction coding. On the other hand, sometimes the BER
closes to 0.5 and then the message cannot be read anymore. In this kind of situation
the correction process of geometrical distortions has probably failed.
It was noted that the bit error rate was acceptable when the JPEG compression ratio
was above 60% and so quite a lot of improvement could be done by using better
error correcting codes. Unfortunately the capacity of the method is not high and
therefore the error correction codes should be carefully chosen in the future works so
that the information rate would be optimal for the task.
56
6. PRINT-CAM RESILIENT WATERMARKING
Print-scan robustness is a good requirement to begin with in watermarking systems
but the number of applications is limited. The print-cam process would have a great
deal more applications because many people carry around a camera phone in their
pockets but the attacks are more severe. The print-scan process introduces many of
the attacks that are present in the print-cam process but in a simplified form. For
example, in the print-scan process the image may be translated in horizontal and in
vertical directions in the scanned image, whereas in the print-cam process also the
distance between the image and the camera varies. This three dimensionality of the
problem makes the extraction of the watermark more difficult than in the print-scan
process and different synchronization methods are required.
Here a frame is added around the image and a method for finding the corner points
of the frame is proposed. With the corner points, the affine transformation
parameters are determined to approximate and invert perspective transformations.
The block diagram of the proposed system is shown in Figure 31.
The method for embedding and extracting the multibit message is the same as in
section 5.3. Unfortunately, the multibit message watermark is very sensitive against
even small distortions and although the rotation, scale and translation are inverted
with the affine transform the inversion process is not accurate enough and thus a
more specific method is needed to find the exact amount of translation. Here the
same method for determining translation is used as in section 5.2.
Before extracting any of the watermarks the barrel distortions are inverted with
the Camera Calibration Toolbox [8]. All of the pictures taken with a camera do not
contain lens distortions, but in some cameras they are so severe that their correction
cannot be neglected. The lens distortions such as barrel distortions occur due to the
lens properties and therefore the parameters for correction transform need to be
calculated only once. The parameters can be determined beforehand with a reference
image as explained in section 2.3.4.
Figure 31. Block diagram of the proposed print-cam robust method.
6.1. Frame detection method
Not much research has been done on the field of reading watermarks with a camera
phone but this is no wonder - for the camera phones have been around only from the
Embed visible
frame
Extract the multibit
message
Inverting barrel
distortions
Process frame
information
Determine the
amount of translation
Host
image
Taking a
picture
Embed multibit
message
Embed translation
watermark
57
year 2000 when the Sharp corporation announced the first camera phone ever. Only
during the last few years, have the camera resolutions grown high enough for
watermark detection and the first commercial applications have been invented as
explained in the chapter 4.
The problem of reading watermarks with camera phones is somewhat similar to
that of the print-scan process, but the biggest difference is the extra dimension, the
effects of which need to be considered. While in a two dimensional problem we
examined a planar surface from the level of the surface, in the three dimensional
problem we examine the surface from somewhere above. In the simplest case of the
three dimensional problem, the optical axis is perpendicular to the plane and the
resulting picture can be analysed with same ways as the two dimensional case. If the
plane is tilted relative to the optical axis, reading the watermark gets more difficult
because the relative distances between the points on the surface plane and the camera
have changed.
As there is no way to get to know the amount of tilting, the only acknowledged
solution to this is to use affine transformation as an approximation to inverse the
effects of the perspective distortion. The method used in here is a modified version of
that by Katayama et al. [47], where a frame was added around the image and the
corner points were calculated to determine affine transform parameters.
6.1.1 Embedding
The frame embedded here is identical to that by Katayama et al. [47]. A frame is
added along the outside of the image as in Figure 32. To separate the frame from the
image, the frame is added at a small distance from the border of the image. The
distance is related to the width of the frame so that it is possible later to determine the
exact location of the image. The colour of the frame was chosen to be blue, but it
could be any other with an intensity level different enough from the background. The
frame width and the width of the gap between the image and the frame were decided
to be 5 pixels.
Figure 32. Framed image.
58
6.1.2 Extracting
In the method by Katayama et al. [47], the extraction of the frame was performed
with frame detection filters and thresholds. A point was judged to be part of the
frame if the result of the frame-detection filter at that point was bigger that a
predefined threshold value. The correct threshold value varies from image to image
and even over same image with lighting and therefore thresholding was not used here
but a different kind of a method was developed.
The beginning of this method is similar to Katayama et al.s [47]. As in their
method, the picture taken is divided with a crosswise line to upper and lower sections.
It can be assumed that the watermarked image lies somewhere around the centre of
the captured image and so the crosswise line is assumed to cross left and right sides
of the frame. The frame sides can thus be found by searching along that line. At this
point of calculations, we do not know the scale of the picture and so we do not know
the width of the frame. However, it can be estimated by differentiating pixel intensity
in the crosswise direction and calculating the width from the positions of the
maximum and minimum values. The process of frame detection is illustrated in
Figure 33.
Figure 33. The frame is found by searching along a crosswise
line and advancing up and down the found side of the frame.
When the width of the frame is known, the information can be used in the frame
detection filter. The frame detection filter matrix is of size 3xn where the n is two
pixels more than the width of the frame. Along the both sides of the matrix there are
values of n-2 and the middle of the matrix is filled with -2s. For example, for frame
width of 5 pixels the frame detection filter is of the form
=
2 5 5 5 5 5 2
2 5 5 5 5 5 2
2 5 5 5 5 5 2
F
, (40)
For each point to be examined a convolution value is calculated with
59
= =
=
3
1 1 i
n
j
ij ij
F I FrameValue , (41)
where I is a small part of the image centred about the point to be examined.
The algorithm begins from the midpoint of the left edge of the captured image.
From there the calculations advance to the right searching for the left side of the
frame. After the location of the left side of the frame has been found it can be traced
up- and downwards to find the side of the frame, as shown with red coloured lines in
Figure 33. The side can be slanted and therefore one pixel to the left and right from
the current side position should also be examined instead of examining only the pixel
directly above or below the current position.
Unlike in the method by Katayama et al. [47] where the corners and sides of the
frame were determined with a threshold value a different approach has been chosen
here which does not require thresholding. The examining of the pixels is done with
the frame detection filter described earlier. From the three pixel values examined on
every row, the one with the maximum filter value is chosen to be part of the frame.
At some point the calculations go over the point where the side of the frame ends but
the calculations are continued nevertheless. The values after that are not correct but
the length of the incorrect segment is assumed to be small compared with the length
of the frame side. Therefore we can take all the points of the frame just calculated
and approximate them with a straight line. This same procedure is repeated to all the
sides of the quadrilateral frame. After the straight lines are approximated, the corners
of the quadrilateral can be approximated from the intersections of the lines.
The approximations of the corners from the intersections of the straight lines are
not entirely correct and therefore a small area around the points is chosen and
inspected further. The approximations are further specified by selecting a small area
around the intersection and determining the exact location of the corner point by
correlating a small corner image with the small area around the assumed corner. By
using correlation we can determine the exact location of the crossing and the corners
of the quadrilateral frame are thus found as shown in Figure 34.
Figure 34. The found corners of the frame.
60
The correction of the perspective distortion is done with the following equations [47].
position picture camera y x
position picture original y x
y b x a
c y b x a
y
y b x a
c y b x a
x
: ) , (
: ) , (
1
1
0 0
2 2 2
0 0
1 1 1
+ +
+ +
=
+ +
+ +
=
. (42)
The algorithm for the extraction process is as follows:
1. Find the width of the frame
1.1. Divide the image with crosswise line to upper and lower sections
1.2. Differentiate the pixel intensities in the crosswise direction
1.3. Select the width from the positions of the maximum and minimum values
2. Determine the frame detection filter
3. Locate the corner points with frame detection filter
3.1. Start from the middle of the left side of the image
3.2. Find all the sides of the frame
3.2.1. Advance to the right until a side of the frame has been found
3.2.2. Trace the frame up and downwards and examine also the points one
pixel to the left and right of the current side position
3.2.3. Select maximum of the three points to be part of the frame
3.2.4. Rotate the image to find other sides and go back to 3.2.1 until all the
sides of the frame has been found
3.3. Approximate the points with straight lines
3.4. Calculate the intersections of the lines
4. Refine corner locations with correlation
5. Correct perspective distortions
6.2. Experiments and results
The experiments were done with the 512x512 Lena image and (15,7) BCH coded
message of the length 135 bits as in the print-scan method described earlier. The
message watermark was embedded in the image as in section 5.3. and because the
frame cannot correct translation attack accurately enough, a template watermark was
embedded in the spatial domain as in section 5.2. The frame was attached around the
image to recover the image from geometrical and perspective transforms.
The research was done with five images: one uncompressed image and three JPEG
compressed images which are shown in Appendix 5. Compression ratios examined
are 100, 80 and 60 and the images were printed with Hewlet Packard ColorLaserJet
5500 DTN printer. Before printing the images out PSNR and PSPNR values were
calculated for each of the images. The values were gathered up to Table 4 and from
the values it is possible to see that the qualities of the images stayed high even after
embedding process.
61
Table 4. PSNR and PSPNR after embedding process
JPEG Compression Ratio PSNR PSPNR
(uncompressed) 39.2 59.5
100 39.0 58.5
80 37.1 48.6
60 36.6 47.1
Every image was photographed 100 times with resolution 800x600 and 100 times
with resolution 1600x1200. In advance to the photographing, the image was pinned
against a wall to make it straight but no special arrangements were done to prevent
the camera from moving: the pictures were taken as perpendicularly as possible to
the image on the wall, but freehandedly.
The camera phone used was Nokia N90 with CMOS (Complementary Metal
Oxide Semiconductor) 2 megapixel camera with focal length of 5.5mm. This
information is useful for determining the camera parameters when correcting the
barrel distortions with the Camera Calibration Toolbox. The available image
resolutions in the camera were 640x480, 800x600 and 1600x1200, but it was found
that the lowest resolution level is too low for watermark extraction.
The resulting success ratios of the method are displayed in Table 5. The rows show
the images with different compression ratios where as the columns show the results
with different resolution settings of the camera. In the experiments, the images taken
with resolution 1600x1200 were scaled to 25% prior estimating the parameters. This
was done to reduce the computational complexity and thus processing speed and
memory consumption.
Table 5. Average success ratio with different compression ratios and capturing
settings
JPEG Compression
Ratio
resolution
800x600
resolution
1600x1200
averaged
(uncompressed) 75.0% 96.0% 85.5%
100 90.0% 90.0% 90.0%
80 82.0% 92.0% 87.0%
60 31.0% 69.0% 50.5%
Table 6 shows the average BER values of the experiments. The arrangement of the
rows is similar to that in the previous table: the rows show BER values for images
with various compression ratios whereas the columns show results for different
resolution values. The BER values for the table were calculated before error
correction because calculations were done to examine how many errors would be
expected in the process, not how well the error correction coding performs. The
value in parentheses indicates the average BER when the extraction process was not
successful.
62
Table 6. Average BER with different compression ratios and capturing settings
JPEG Compression
Ratio
resolution
800x600
resolution
1600x1200
averaged
(uncompressed) 5.6% (11.3%) 3.1% (20.0%) 4.4%
100 4.6% (15.8%) 3.1% (9.3%) 3.9%
80 4.8% (8.9%) 3.6% (5.5%) 4.2%
60 10.7% (11.8%) 6.5% (11.9%) 8.6%
6.3. Discussion
The results in Table 5 show that the method is very promising and works well in the
test case. Unfortunately, the method is fairly sensitive to distortions and some
restrictions were necessary to make the method work: for example the image was
taken as perpendicularly as possible above the watermarked image. This is due to the
wavelet domain multibit message watermark, which requires nearly perfect
correction of geometrical distortions.
The multibit watermark is especially frail against tilting of the optical axis. If the
optical axis is tilted, some parts of the image appear to be closer that the other. In the
camera image, the parts that are close are presented with high resolution but parts
that are further away are presented with lower resolution. In some cases, the
resolution could be too low and the correction algorithm cannot correct distortions
accurately enough and message will not be extracted correctly. The multibit message
watermark in the wavelet domain requires high resolution and so it will be destroyed
if the tilting of the optical axis is too high.
The choice of resolution at which the image is taken is important. The succession
ratios were significantly better when the resolution level of 1200x1600 was used.
From the results it can be deduced that the 2 megapixel camera seems to be enough
for reading a watermark correctly from a printed image but higher resolution would
obviously be better. This is not a problem as the cameras evolve rapidly in the
mobile phones and even while this work is being documented phones with better
camera capabilities are being published.
Even with this camera and camera properties, the BER values of the method are
acceptable. When looking at the values in Table 6 it is possible to see that the method
would work better with stronger error correction coding. The values in parentheses
are BER when the method was not successful, that is, when the error correction
failed. The values are evidently below 0.5, which is the limit around which error
correction is not possible with any coding technique.
Some more specific information about the calculated BER values is included to
Appendix 4. The images show that the BER values are usually at the same low level
but now and then there are peaks to the level 0.5. This indicates that the extraction of
the watermark has failed completely. In these cases it can be assumed that the
synchronization process has somehow failed.
One of the possible reasons why the message is too erroneous to be read correctly
in addition to the tilting of the optical axis is the compression of the image
beforehand. Table 5 shows that the method is robust against slight JPEG
compression but deteriorates rapidly as the compression ratio increases. This was
63
expected, but even though the JPEG compression worsens, the results, the success
ratios, are still above 90% with compression ratios of 80%.
It must be noted, however, that the compression ratios reported here tell only the
amount of compression applied to the image before printing. More compression will
occur when the image is taken with the camera phone which compresses the image
before saving it to the memory. From this point of view, the watermarking method is
even more robust against JPEG compression.
Comparing of the method developed here with other similar methods is difficult
because only few watermarking methods have been proposed for the camera phones.
The method by Nakamura et al. [37] also used a frame around the watermarked
image to correct perspective distortions but the way they handled the results was
different from mine. They reported as high success ratios as 100% when the picture
was taken straight above the watermarked image, but this value cannot be compared
with the method proposed earlier because they used a much stronger error correction
coding and the camera phones used were different. Also, the capacity of their method
was lower and smaller message was embedded in the image: only 16 message bits
were embedded against 63 bits embedded here. This, too, enhanced the robustness in
their favour and it is claimed that the method proposed here would compete well
with theirs with similar settings.
The experiments were done here in a noise free environment and many distortions
were neglected. For example, impacts of lighting were not considered and the light
around the image was stable through the experiments. In the future it is important to
do research with different light conditions and variable lighting, as the reflections of
the light from the image will affect to the extraction of the watermark.
Another difficult thing to be examined is the distortions around the image. Rarely
in the real life, is the image placed alone to the page. Often the image is surrounded
with other images and text. The method must be improved to handle these kinds of
distortions. Nevertheless, the method is promising and a great deal of information
was gained in the research for the future use.
64
7. DISCUSSION
No one knows what the future brings but we can always make good guesses. Even
now, more and more content moves around without wires between portable devises,
cell phones, laptops, PDAs and so on. A growing number of people have cell phones
in their pockets accompanied with mp3 players and digital cameras, but even the
limits between devices are diminishing. A cell phone may now contain in itself a
media player and a video camera and still be available for consumers at a reasonable
price.
As the properties of devices blend together to one device, so does different media
formats. With watermarking, music files can be included in image files and links to
websites are embedded in both of them. In this work, two watermarking methods
were proposed for value adding watermarking. The first method was robust against
print-scan attacks which were considered as prerequisites for the second, print-cam
method.
The print-scan process works in two dimensional world in which the user should
own a scanner to be able to read the watermark, but from the users point of view it
would be easier if the watermark could be read by taking a picture of the
watermarked image with a camera phone and the watermark could be read at anytime
in anywhere.
A motivation for this work was the lack of publications discussing about reading
watermarks with digital cameras or camera phones. Only few papers were found and
almost all of them were developed for commercial purposes. This indicates that there
is a demand for print-cam robust watermarking systems.
The methods presented here were fairly similar in spite of the different
environments they were required to work in. In the print-scan robust method, the
focus was on inverting some geometrical distortions, that is, rotation, scale and
translation. Along with that, the print-cam robust method focused on correcting the
effects of perspective distortions. In both of the methods, multiple watermarking
methods were employed, where the multibit watermark was embedded in the wavelet
domain and one or two template watermarks were embedded to recover from
geometrical distortions. The parameters for inverting translation were determined
with the same watermark in both methods, but the rotation and scale were calculated
with different kinds of watermarks: with a template in Fourier domain in the print-
scan robust method and with a visible frame in the print-cam method. Both methods
seemed to work very well, and, with a stronger error correction coding, the results
would have been even better.
While storing the bits of media for future use, the compression algorithms have a
huge role to play. Right now the most popular image compression format is the JPEG
compression and every watermarking system should be robust against it. Here it has
been shown that both of the methods are robust against JPEG compression with a
compression ratio of 60. Compression ratios less than that are rarely used because
after compression ratio of 60 the compression starts to decrease the quality of the
image.
In the future work, the reliability of the methods will be improved and new
synchronization methods will be developed. The focus will be transferred to the
print-cam robust methods and print-scan robustness will be only the first step
towards print-cam robust systems. The next generation of camera phones will be
65
published soon and the resolutions in the cameras will increase. Soon the qualities of
the camera phones will exceed the qualities of the present digital cameras and thus
the research will be done in the near future with digital cameras instead of existing
camera phones.
66
8. CONCLUSION
The aim of this work was to find a method to read a watermark from a printed image
with a camera phone. As a prerequisite for the problem, a print-scan robust
watermarking method was developed an examined. Based on the results achieved
with the print-scan robust watermarking method, a print-cam robust method was
proposed.
In both of the methods multiple watermarking methods were applied successfully
and the results obtained were promising. In the print-scan robust method, three
watermarks were embedded in the image: two template watermarks were embedded
in order to recover the watermark from rotation, scale and translation attacks and the
third watermark embedded was the multibit watermark, containing the actual
message. One of the template watermarks was a pseudorandom sequence embedded
in the magnitudes of the Fourier domain in a form of a circle around the centre of the
magnitude coefficients. With this watermark, the rotation angle and the amount of
the scale of the image occurred at the scanning process could be inverted. Since the
magnitudes of the Fourier domain are invariant to translation, a second watermark
was required and embedded in the spatial domain. The multibit message watermark
was embedded at last in the wavelet domain.
In the print-cam robust method, two invisible watermarks were embedded in the
image and a visible frame was added around the image. The visible frame was
necessary so that the perspective distortions could be approximated and inverted with
affine transformation. The two invisible watermarks embedded were the same as in
the print-scan robust method. One of the invisible watermarks was a template
watermark, embedded in the image in order to recover the watermark from
translation attack, whereas the other invisible watermark embedded was the multibit
message watermark.
The methods were tested by taking multiple pictures with a scanner and camera
phone from a watermarked image. The success ratios and BER values were
calculated for both of the methods with various resolutions levels. In the print-scan
robust method, the results with resolution levels did not have any significant
differences but in the print-cam robust method the difference was obvious: the higher
resolution level gave better results regardless of the compression level of the image
used.
The methods were also tested by compressing the test image beforehand with
different JPEG compression ratios. The results were expected, as the success ratio
decreased and the BER increased while the compression ratio decreased. However,
the results were acceptable until the compression ratio went below 60 and thus it can
be concluded that both of the methods are robust against JPEG compression.
The results of the methods do not reach 100%, however, but with better error
correction coding the value could be approached. The future work includes
improving of the print-cam method and moving to use digital cameras instead of
camera phones. This is due the fact that the qualities of the cameras in cell phones
are getting better and soon they will reach the qualities of modern digital cameras.
67
9. REFERENCE
[1] Cox, I.J., Miller M.L. & Bloom J.A. (2002) Digital watermarking. Morgan
Kaufman publishers, Academic Press, USA, 542 p. ISBN 1-55860-714-5
[2] Hanjalic, A., Langelaar, G.C., van Roosmalen, P.M.B., Biemond, J. &
Langendijk, R.L. (2000) Image and Video Databases: Restoration, Watermarking
and Retrieval. Elsevier Science B.V., Amsterdam, Netherlands, 445 p. ISBN 0-444-
50502-4
[3] Mkel K. (2000) Digital Watermarking and Steganography. Diploma Thesis.
University of Oulu, Department of Electrical Engineering, Oulu, Finland.
[4] Chou, C-H & Li, Y-C. (1995) A Perceptually Tuned Subband Image Coder
Based on the Measure of Just-Noticeable-Distortion Profile. In: IEEE Transactions
on circuits and systems for video technology. Dec 1995, Vol. 5, Issue 6, pp. 467 -
476.
[5] Perry, B., MacIntosh, B. & Cushman, D. (2002) Digimarc MediaBridge The
birth of a consumer product, from concept to commercial application. In:
Proceedings of SPIE Security and Watermarking of Multimedia Contents IV, Jan 21-
24, San Jose, California, USA, Vol. 4675, pp. 118-123.
[6] He, D. & Sun, Q. (2005) A Practical Print-scan Resilient Watermarking Scheme.
In: IEEE International Conference on Image Processing (ICIP), Sept. 11-14, 2005,
Vol. 1, pp. I - 257-60.
[7] Solanki, K., Madhow, U., Manjunath, B.S. & Chandrasekaran, S. (2004)
Estimating and Undoing Rotation for Print-scan Resilient Data Hiding. In: IEEE
International Conference on Image Processing (ICIP), Oct. 24-26, Vol. 1, pp. 39-42.
[8] Camera Calibration Toolbox for Generic Lenses. (read 27.8.2006) URL:
http://www.ee.oulu.fi/mvg/mvg.php?page=calibration. Matlab
=
z
y
x
p , (1)
where x, y and z are components of basis vectors in this point, so that
3 2 1 0
z y x P P + + + = . (2)
However, this representation is not very good, because it can be confused with a
representation of a vector
3 2 1
z y x W + + = , (3)
which does not have any starting point and can be placed anywhere in the space.
Homogeneous coordinates offer a solution for this problem by introducing an extra
dimension. Then, point P can be written uniquely as
=
1
z
y
x
p , (4)
because from equation (2)
[ ]
=
0
3
2
1
1
P
z y x P
. (5)
Similarly, the vector w can be written in column matrix representation as
Appendix 1 Homogeneous coordinates 73
=
0
z
y
x
w , (6)
and
[ ]
= + + =
0
3
2
1
3 2 1
0
P
z y x z y x W
. (7)
We can see that the representations of a point and a vector are now different and
cannot be confused anymore. By using same derivation, an arbitrary transformation
matrix
,
=
i h g
f e d
c b a
s (8)
can be represented in homogeneous coordinates as
.
1 0 0 0
0
0
0
=
i h g
f e d
c b a
s (9)
Although the transformations are now done in four dimensional space to solve a
three dimensional case when the homogeneous coordinates are used, less arithmetic
work is required. The uniform representation of affine transformations makes
composing of successive transformations far easier and faster than in three
dimensions. In addition, modern computers are able to use parallelism to speed up
homogeneous coordinate operations.
Appendix 2 BER figures for print-scan method 74
The BER figures presented here are extracted from the tests of print-scan method
proposed earlier. On every y-axel show the BER value and x-axel give the number of
image examined. CR means JPEG compression ratio and last number in the figure
caption tells the resolution used in scanning.
Figure A.2.1 BER (CR=uncomp, 300dpi). Figure A.2.2 BER (CR=uncomp, 150dpi).
Figure A.2.1 BER (CR=100, 300dpi). Figure A.2.2 BER (CR=100, 150dpi).
Figure A.2.3 BER (CR=80, 300dpi). Figure A.2.4 BER (CR=80, 150dpi).
Appendix 2 BER figures for print-scan method 75
Figure A.2.5 BER (CR=60, 300dpi). Figure A.2.6 BER (CR=60, 150dpi).
Figure A.2.7 BER (CR=40, 300dpi). Figure A.2.8 BER (CR=40, 150dpi).
Figure A.2.9 BER (CR=100, 300dpi) Figure A.2.10 BER (CR=100, 150dpi)
(HP LaserJet 4500 DN). (HP LaserJet 4500 DN).
Appendix 3 Images used in print-scan process for testing 76
Figure A.3.1 Original image before watermarking.
Figure A.3.2 Image after watermark embedding and JPEG
compression with compression ratio of 100.
Appendix 3 Images used in print-scan process for testing 77
Figure A.3.3 Image after watermark embedding and JPEG
compression with compression ratio of 80.
Figure A.3.4 Image after watermark embedding and JPEG
compression with compression ratio of 60.
Appendix 3 Images used in print-scan process for testing 78
Figure A.3.5 Image after watermark embedding and JPEG
compression with compression ratio of 40.
Appendix 4 BER in print-cam method 79
The BER figures presented here are extracted from the tests of print-cam method
proposed earlier. On every y-axel show the BER value and x-axel give the number of
image examined. CR means JPEG compression ratio and last number in the figure
caption tells the resolution used when the picture was taken.
Figure A.4.1 BER (uncomp., 800x600). Figure A.4.2 BER (uncomp., 1600x1200).
Figure A.4.3 BER (CR=100, 800x600). Figure A.4.4 BER (CR=100, 1600x1200).
Figure A.4.5 BER (CR=80, 800x600). Figure A.4.6 BER (CR=80, 1600x1200).
Appendix 4 BER in print-cam method 80
Figure A.4.7 BER (CR=60, 800x600). Figure A.4.8 BER (CR=60, 1600x1200).
Appendix 5 Images used in print-cam process for testing 81
Figure A.5.1 Original image before watermarking.
Figure A.5.2 Image after watermark embedding before JPEG compression.
Appendix 5 Images used in print-cam process for testing 82
Figure A.3.3 Image after watermark embedding (JPEG CR = 100).
Figure A.3.4 Image after watermark embedding (JPEG CR = 80).
Appendix 5 Images used in print-cam process for testing 83
Figure A.3.5 Image after watermark embedding (JPEG CR = 60).
Figure A.3.5 Image after watermark embedding (JPEG CR = 60).