Professional Documents
Culture Documents
CSVT Forgerydetection Manuscript Submitted
CSVT Forgerydetection Manuscript Submitted
net/publication/224223658
Article in IEEE Transactions on Circuits and Systems for Video Technology · May 2011
DOI: 10.1109/TCSVT.2011.2125370 · Source: IEEE Xplore
CITATIONS READS
46 284
3 authors, including:
Some of the authors of this publication are also working on these related projects:
Key-frame-based depth propagation for semi-automatic stereoscopic video conversion View project
All content following this page was uploaded by Guo-Shiang Lin on 01 June 2015.
ABSTRACT
In this paper, we propose a passive-blind scheme for detecting forged images. The scheme leverages
quantization table estimation to measure the inconsistency among images. To improve the accuracy of the
estimation process, each AC DCT coefficient is first classified into a specific type, and then the corresponding
quantization step size is measured adaptively from its power spectrum density (PSD) and the PSD’s Fourier
transform. The proposed content-adaptive quantization table estimation scheme is comprised of three phases:
pre-screening, candidate region selection, and tampered region identification. In the pre-screening phase, we
determine whether an input image has been JPEG compressed, and count the number of quantization steps
whose size is equal to 1. To select candidate regions for estimating the quantization table, we devise an
algorithm for candidate region selection based on seed region generation and region growing. Seed region
generation is first used to find a suitable region by removing suspect regions, after which the selected seed
region is merged with other suitable regions to form a candidate region. To avoid merging suspect regions, a
candidate region refinement operation is performed in the region growing step. After estimating the quantization
table from the candidate region, an MLR classifier exploits the inconsistency of the quantization table to identify
tampered regions block by block. Experiment results demonstrate that the proposed scheme can estimate
Index Terms— forgery detection, quantization table estimation, image forensics, Copy-paste tampering
1
I. INTRODUCTION
In the last decade, the volume of digital image/video content has increased dramatically due to the availability
of low-cost multimedia devices. At the same time, it has become easier to copy, duplicate, and manipulate such
content without degrading the quality because of the development of increasingly sophisticated digital
processing tools. In addition, computer graphics can now generate images with a photo-realistic quality level, so
it is expected that confidence in the reliability and veracity of digital images/videos will decline. The potential
negative impact on some applications (e.g., criminal investigations) is obvious. Therefore, image/video forensics
is becoming increasingly important. Image/video forensics involves two important issues: source identification
and forgery detection. The former identifies the source digital devices (cameras, mobile phones, etc), while the
latter tries to determine whether a digital image/video has undergone any form of manipulation or processing
after capture. In this paper, we focus on the second issue and propose a scheme for detecting forged images.
One of the most important applications of digital watermarking [1],[2] is the authentication of the content of
digital images. Digital watermarking can be thought of an active (or intrusive [3]) approach because a specific
signal (i.e., a watermark) called an extrinsic fingerprint [17] must be hidden in an image/video. For example, the
uncompressed-domain scheme proposed in [1] hides a robust watermark and a fragile watermark simultaneously
in the same wavelet domain. The two watermarks are embedded by using positive and negative modulations,
respectively, and tampering can be detected by checking the relation between the watermarks. In contrast, the
compressed-domain scheme in [2] provides dual protection of JPEG images based on informed embedding and a
two-stage watermark extraction technique. The embedded watermark, which does not affect the visual quality of
the image, can be used to verify the image’s content. If digital watermarking equipment is not available, non-
watermarking (non-intrusive [3]) techniques that leverage the inherent properties or traces (i.e., the intrinsic
fingerprint [17]) of an image/video can be used to passively explore the content to identify the source and verify
the content’s integrity. However, the restricted portability and accessibility of original images generally make
blind forgery detection more attractive and feasible in many practical applications. These reasons motivate us to
develop a passive-blind scheme for verifying image content and localizing tampered regions without prior
2
Tampering with, or forging, an image involves making subtle changes to the image’s gray levels. Generally,
such changes are imperceptible to the human eye, but some tiny variations can be detected by computer
processing techniques. Forgery detection schemes assume that tiny changes caused by tampering may modify
the underlying statistics of an image. To explore the changes and find useful clues for forgery detection, it is
necessary to analyze each process and possible type of noise (e.g., de-mosaicing, gamma correction, lighting,
and sensor noise) in the digital devices used to capture images. As mentioned in [6][12], there are several kinds
of passive schemes based on some characteristics of digital signal processing and capture devices, such as color
filter array [13][14], sensor pattern noise [4][15][16], camera response function [12], JPEG quantization and
blocking artifact [3][5][9][18][20], and quality modification [19]. For instance, Hsu et al. [12] proposed a sliced
image detection method that checks the consistency of a camera’s characteristics in different regions. In each
segment, an area’s intensity features and a cross fitting error that measures the consistency of the camera’s
responses are combined with a classifier to detect splicing. The phenomenon of a regular symmetrical shape in
the blocking artifacts of a JPEG image’s matrix is analyzed in [5], and a method for detecting cropped and
recompressed blocks is proposed. Avcibas et al. [19] suggested that any image will undergo quality degradation
after smoothing or low-pass filtering. The amount of degradation (reacting on image quality) depends on the
type of test image, especially in categories of with or without image manipulations. That is, by observing
differences a test image and its smoothed version, it is possible to identify images that have been manipulated.
The authors utilized a linear regression classifier with several statistical features of quality measuring operators
Since most digital images are stored and transmitted in JPEG format, it is possible that the content of tampered
regions is taken from other JPEG images. Most forgery detection methods only operate in the spatial domain, so
they may be vulnerable to JPEG compression. To reduce the impact of JPEG compression on forgery detection,
the properties resulting from JPEG compression should be suitable and useful. Images that have been tampered
with (called tampered images hereafter) can be classified into the following four types, based on whether JPEG
compression is applied on the input and output images: raw/raw, raw/JPEG, JPEG/raw, and JPEG/JPEG. It is
highly likely that faked images belonging to the third and fourth categories occur because most low-cost
multimedia devices use JPEG compression for storage. To detect these two types, the information in the
3
quantization table of an image can be utilized. In [18], a blocking artifact measure is computed and used to detect
forged images. However, it is not clear how suspect regions can be selected and how the quantization matrix can
be estimated based on the un-tampered regions. This motivates us to develop an image forgery detection scheme
based on the information in the quantization table. In this paper, we focus on the third type of forgery, JPEG/raw.
The study of the last type of forgery, JPEG/JPEG, is beyond the scope of this paper.
The remainder of the paper is organized as follows. In Section II, we describe the system; and in Section III,
we explain the proposed passive-blind forgery detection scheme. In Section IV, we detail the experiment results
Quantization is a key process that controls the image quality and bit rate in the JPEG compression standard.
After JPEG compression, some phenomena caused by quantization occur in the resulting image. The most
important phenomenon is that DCT coefficients at the same frequency become multiples of the corresponding
quantization step size after inverse quantization (IQ). This implies that there are several peaks in the multiples of
the corresponding quantization step size in the histogram of each DCT frequency. Therefore, the phenomenon
can be utilized to estimate the quantization step size; however, it may not be obvious if an image contains
suspect regions whose information is in different quantization tables. It is assumed that the information about
tampered areas and genuine areas will be in different tables. To determine whether an area has been tampered
with, the quantization table can be checked for inconsistency by the quantization table estimation algorithm, the
Figure 1 shows a block diagram of the proposed forgery detection scheme, which is based on quantization
table estimation. To deal with the third type of forgery (i.e., JPEG/raw), the scheme is implemented in three
phases: pre-screening, candidate region selection, and tampered region identification. First, since many images
are not JPEG compressed, to improve the efficiency of our scheme, we apply a pre-screening test to determine
whether or not a test image has been JPEG compressed. Note that the test image for the third type of forgery,
JPEG/raw, is derived from raw data. In the blind forgery detection process, the information about the
quantization table used in an image is unavailable, so it must be estimated directly from the image. Second, to
reduce the impact of tampered regions on quantization estimation, we select suitable regions as candidates for
4
quantization table estimation. It is assumed that the larger the number of genuine regions selected, the better the
accuracy of quantization table estimation will be. Finally, since the information about tampered areas is
unknown, a blind classifier [23] must be used in real applications. Similar to the concept proposed in [12], we
develop a blind classifier based on the inconsistency of the quantization table to identify tampered regions.
(a) A sub-image U n is an area comprised of a number of blocks (8×8 pixels), where n is the sub-image index.
The size of a sub-image is determined by the smallest number of blocks required for quantization table
estimation. (We discuss this aspect later in the paper.) Then, an image U contains several non-overlapping
sub-images, i.e.,
U = UU n , (1)
n
where ∪ denotes the union operator. Figure 2 illustrates the relations between blocks and sub-images in an
image.
{ }
(b) X q = x(qi , j ) (k , l ) 0 ≤ k , l ≤ 7,0 ≤ i ≤ ⎣N H / 8⎦,0 ≤ j ≤ ⎣NW / 8⎦ is the quantized DCT representation after
quantization (based on 8 × 8 blocks), where (i,j) is the block index, ⎣⋅⎦ denotes the floor function, and
N H × NW are the image’s dimensions. Basically, the quantization process can be expressed as follows:
⎡ x(i , j ) (k , l ) ⎤
Q ( X ) = X q and x(qi , j ) (k , l ) = ⎢ ⎥, (2)
⎢⎣ q(k , l ) ⎥⎦
where x(i , j ) (k , l ) represents the DCT coefficient, q(k , l ) denotes the quantization step size at the (k,l)
position of a quantization table Q, [ ⋅ ] is the rounding operator, and Q (⋅) is a quantizer. In JPEG
{ }
compression, the quantization table Q = q( k , l ) 0 ≤ k , l ≤ 7 is the same for all 8×8 blocks.
5
{ }
(c) X r = x(ri , j ) (k , l ) 0 ≤ k , l ≤ 7,0 ≤ i ≤ ⎣N H / 8⎦,0 ≤ j ≤ ⎣NW / 8⎦ is the reconstructed DCT version after IQ.
( )
Q -1 X q = X r and x(ri , j ) (k , l ) = x(qi , j ) (k , l ) ⋅ q(k , l ) , (3)
(d) S (k ,l ) denotes an integer sample sequence containing all the (k,l)-th AC coefficients in each 8×8 DCT
block.
(e) h(k ,l ) indicates the histogram of the (k,l)-th AC coefficient and can be expressed as
h(k ,l ) = ⎧⎨h(k ,l ) (t ) h(k ,l ) (t ) = ∑ δ (S (k ,l )-t ), min{S (k ,l ) } ≤ t ≤ max{S (k ,l ) }⎫⎬ , where δ (⋅) denotes the impulse
⎩ ⎭
h(k ,l ) is expressed as
H (k ,l ) = F (h(k ,l ) ) , 0 ≤ k , l ≤ 7 , (4)
(f) Ψ(k ,l ) is the power spectrum density (PSD) of the (k,l)-th AC coefficient and can be obtained by
Ψ(k ,l ) = Re 2 (H (k ,l ) ) + Im 2 (H (k ,l ) ) , 0 ≤ k , l ≤ 7 , (5)
where Re(⋅) and Im(⋅) denote the real and imaginary parts of a complex number respectively.
6
III. THE PROPOSED PASSIVE-BLIND FORGERY DETECTION SCHEME
In [18], the authors analyze the relationship between the power spectrum density (PSD) and the used
quantization step size for each AC frequency component. We observe that the number of peaks of PSD is equal
to the quantization step size subtracting 1 for each AC frequency component. To detect the number of PSD
peaks accurately, the smoothed version of the second derivative of the PSD is obtained and the number of its
local minimums is counted to estimate the quantization step size. However, since different quantization step
sizes generate different PSDs, it is difficult to estimate each quantization step size by a simple method. To
resolve the problem, we classify PSDs into different types based on the image content and then adjust the
estimation algorithm to improve the performance of quantization table estimation. Therefore, our algorithm is a
The proposed algorithm deals with the quantization step size of each AC coefficient one by one. Two features
of the algorithm classify the PSDs of the AC coefficients into four categories. The first feature, f1 , is the number
of local maximums N max for each PSD., where the value of N max is related to the quantization step size. If f1 is
zero, the quantization step size used in the coefficient may be large or equal to1. The second feature, f2 , defines
the shape factor of PSD, and is used to evaluate whether or not the quantization step size is large, as shown in
Figure 3. The bigger the quantization step size, the larger the shape factor will be. To measure a PSD’s shape
factor, a bin index with a smaller value can be measured by finding the intersection of the PSD and a horizon.
The initial value of the horizon is set empirically at 0.1, and is increased progressively if no other intersection
exists. Therefore, based on the two features and two pre-defined thresholds, T1 and T2 , four types of PSD can be
7
f2
f1 T1 > T1 > T1 T1
Types I and IV indicate that small and large quantization step sizes are adopted respectively. Type IV also
occurs when an image contains a large smooth area. As shown in Figs. 4(a) and 4(d), the local maximums
usually disappear. For Type I, the estimated quantization step size is set at 1, which means there is no
quantization. Meanwhile, for Type IV, the information about the AC coefficient is insufficient, so the
quantization table estimation process stops and the symbol “un-estimated” is assigned to the PSD.
For Types II and III, we can estimate the quantization step size based on the PSD. However, as shown in Figs.
4(b) and 4(c), it may not be easy to find the correct positions of local maximums in the PSD due to noise. The
quantization step size can also be estimated by using the Fourier transform of the PSD, since there exists
periodicity in the locations of the PSD’s local maximums information extracted from the PSD and its Fourier
transform should improve quantization step size estimation. Thus, based on the PSD and its Fourier transform,
we use different algorithms to estimate each quantization step size for Types II and III. The basic unit in
8
(a) (b)
(c) (d)
Fig. 4. Examples of the PSDs of Lena: (a) Type I, AC(0,1), step size=1; (b) Type II, AC(0,2), step size=4; (c)
Type III, AC(4,5), step size=40; and (d) Type IV, AC(6,5), step size=75.
E-II-1. Find the bin index P(kΨ,l ) of the first local maximum of Ψ(k ,l ) . Then, the possible quantization step
where ⎡⋅⎤ , max{}, and min{} denote the ceiling, maximum, and minimum operators respectively.
E-II-2. Obtain Φ (k ,l ) and find the index of its first local maximum as the quantization step size q Φ (k , l ) for
E-II-3. Stop the algorithm if q Ψ (k , l ) is equal to q Φ (k , l ) ; otherwise, continue to check the values
q Ψ (k , l ) and q Φ (k , l ) .
9
E-II-4. Find all local maximums within a search range [0 q t (k , l )] and identify candidates whose values
( )
q t (k , l ) = max q Ψ (k , l ), q Φ (k , l ) , (7)
( ( ) ( ))
TΦ = 0.9 ⋅ max Φ ( k ,l ) q Ψ (k , l ) , Φ ( k ,l ) q Φ (k , l ) . (8)
E-II-5. Select the candidate with the minimum index as the possible step size q (k , l ) .
E-II-6. Test whether there are peaks at the multiples of q (k , l ) . If multiples exist, q (k , l ) is the final
( ) ( )
estimated stepsize q * (k , l ) ; otherwise, compare the values of Φ ( k ,l ) q Ψ (k , l ) and Φ ( k ,l ) q Φ (k , l ) .
The index with the larger value is the final estimated step size q * (k , l ) .
Since the rounding errors caused by the image format converter make peaks less sharp, especially in the high
frequency bands, it may not be easy to determine the quantization step size used in Φ ( k ,l ) for Type III. It is
highly likely that the correct quantization step size is around the first local maximum of Φ ( k ,l ) . To find the
correct quantization step size from a range centered at the local maximum of Φ ( k ,l ) , each turning point (its
Φ (k ,l ) and the second derivative of Φ (k ,l ) are not zeros) is found as Ω Φ . Figure 5 illustrates the Fourier
transform of a PSD with rounding errors; the points marked with red circles are the turning points.
The steps of the estimation algorithm for Type III are as follows:
E-III-1. Find Ω Φ of q Φ (k , l ) in Φ (k ,l ) .
E-III-2. Set q Φ (k , l ) as q * (k , l ) if the size of Ω Φ is 1 and stop the estimation procedure; otherwise,
{
q * (k , l ) = arg min q Ψ (k , l ) − m , m ∈ Ω Φ .
m
} (9)
10
(2) If q Ψ (k , l ) is outside of Ω Φ , q Φ (k , l ) is set as q * (k , l ) .
A.4 Analysis of the number of blocks required for quantization table estimation
In addition to estimating each quantization step size, we need to analyze how many blocks are necessary for
quantization table estimation. In fact, the problem can be considered as finding the minimal number of blocks for
quantization table estimation under an acceptable error. To solve the problem, we define the MAE (mean
∑ q(k , l ) − q (k , l )
*
(
EQMAE Q, Q * = ) ( )
k ,l ∈Ω c
Ωc
, (10)
where Ω c represents the AC coefficients that can be estimated, and Ω c denotes the number of AC frequencies
in Ω c . According to Eq. (10), the minimal number of blocks should be decided when the number of estimated
AC coefficients is large and the number of errors is small. We perform simulations on 12 images. Figure 6
illustrates the relation between the average MAE (Eq. (10)) and the number of blocks used for quantization table
estimation. As shown in the figure, the average of MAE approaches a constant when the size of a sub-image is
larger than 256. Therefore, under the proposed scheme, the size of a sub-image is set as at least 256 blocks for
11
Fig. 6. Average MAEs using different numbers of blocks.
B. Pre-screening
As mentioned in Section II, the pre-screening phase, which filters images that have not been JPEG
compressed, is used to improve the efficiency of the proposed forgery detection scheme. Forgers usually try to
alter regions that contain important information, e.g., faces, marks, and numbers, while keeping the original
scene. In other words, the size of a tampered region is often relatively small compared to that of the whole image,
so that the JPEG compression feature can still be detected. Therefore, in the pre-screening phase, we look for the
JPEG compression feature to determine whether an input image has been JPEG compressed.
Recall that q (k , l ) = 1 in a quantization table indicates the (k,l)-th AC coefficients have not been quantized
(Section IIIA). It is assumed that the number of q (k , l ) = 1 will be large if an image without the JPEG
compression feature is tested. This motivates us to use the number of q (k , l ) = 1 to determine whether the JPEG
compression feature exists in an image. Specifically, after quantization table estimation, we develop a classifier
based on the number of q (k , l ) = 1 and apply the following decision rule: if the number of q (k , l ) = 1 is less than
a threshold TP , the image has been JPEG compressed; otherwise, it has not been JPEG compressed. To find a
suitable value for TP , we analyze the number of q (k , l ) = 1 . We select 1414 raw images and then JPEG
compress them with random QFs (quality factors) between 60 and 95. For these images, we count the number of
q(k , l ) = 1 after quantization table estimation. The probability of the number of q(k , l ) = 1 is shown in Fig. 7.
The distribution of the number of q (k , l ) = 1 for compressed images is similar to an exponential distribution.
12
Thus, we use an exponential distribution to model the distribution of the number of q (k , l ) = 1 for compressed
Let t be the number of q (k , l ) = 1 for an image. It is modeled as an exponential distribution [11] (an
p (t ) = λe − λt , t ≥ 0 , (11)
because of its mathematical and computational tractability. To compute the exponential parameter λ i , we utilize
the following maximum likelihood estimation method [10],[11]. Let p ( t ) = p ( t1 , t 2 ,L , t K ) be the joint
probability density function of K samples. Assuming the samples are independent, we can express the likelihood
function LH (t ) as
K ⎛ K ⎞ ⎛ K ⎞
LH (t ) = p(t | λ i ) = ∏ λi e = (λ i ) ⋅ exp⎜⎜ − λ i ∑ t j ⎟⎟ = exp⎜⎜ K ln λ i − λ i ∑ t j ⎟⎟ .
− λ it j K
(12)
j =1 ⎝ j =1 ⎠ ⎝ j =1 ⎠
The next step in estimating the maximum-likelihood involves differentiating the likelihood function and setting
d
the result to zero, i.e., LH (t ) = 0 . Because the logarithmic function is strictly monotonically increasing, the
dλi
d
same maximization can be obtained by differentiating its log-likelihood function, i.e., ln(LH (t)) = 0 . It is
dλi
straightforward to find the estimated exponential parameter as
−1
⎛1 K ⎞
λ̂ i = ⎜⎜ ∑ t j ⎟⎟ . (13)
⎝K j =1 ⎠
13
Since the goal of the pre-screening phase is to select as many images with the JPEG compression feature as
possible, the threshold should be determined under a reasonable missing rate. Here, many samples (K=1414) are
used for simulation. Based on Eq. (13) and the experiment results, the threshold TP can be set as 18 when the
missing rate is almost zero. Under this threshold ( TP =18), the false alarm rate for the test images is only 2.76%.
In [18], the authors did not explain clearly how to find suspect or tampered regions, even though such regions
may affect the accuracy of quantization table estimation. In addition, as mentioned in Section III.B, the size of
tampered regions is usually kept small. It should help reduce the impact of tampered regions on quantization
table estimation and improve the system’s ability to identify tampered regions if genuine regions can be suitably
selected from an image for quantization table estimation. Based on this assumption, we propose a candidate
region selection algorithm for seed region generation and region growing. The algorithm first implements the
seed region generation operation to find a suitable seed region by removing suspect areas. Then, it obtains the
candidate region by merging other genuine regions with the selected seed region. To prevent tampered regions
from being included into the candidate region, a candidate region refinement step is included in the region
growing phase.
In real situations, the content of tampered regions may be derived from raw or JPEG images. If the content is
copied from other JPEG images, there is a high probability of inconsistency in the quantization table, so this
characteristic can be used to find tampered regions. On the other hand, if the content is derived from other raw
images, the number of q (k , l ) = 1 in the tampered region should be very large. These two characteristics,
inconsistency in the quantization table and the number of q (k , l ) = 1 , can exploited by the candidate region
selection algorithm.
The basic operating unit in seed region generation is a sub-image. Let a seed region contain N S sub-images.
The following problem involves selecting N S sub-images. It is assumed that the amount of inconsistency in the
14
quantization table and the number of q (k , l ) = 1 are small in the seed region. The steps of seed region generation
are as follows.
S3. Use the estimated quantization table to re-quantize and inverse-quantize the sub-image to calculate the
mean square error (MSE) E xMSE of DCT coefficients before and after recompression as follows:
∑ (x( ) (k , l ) − x( ) (k , l ))
1
E xMSE (U n ) = r 2
i, j i, j , (14)
Un ( )
i , j ∈U n
( k ,l )∈Ωc
where U n is the number of 8×8 blocks in U n . Since the energy generated by distortion will be preserved
in both the image and the DCT domains, evaluating and minimizing the distortion in both domains will
have the same effect. This means that tiny changes resulting from tampering in the spatial domain can be
S4. Count the number of q (k , l ) = 1 in the estimated quantization table for each sub-image.
S5. Eliminate any sub-image whose number of q (k , l ) = 1 is larger than a threshold TC (48) and obtain U ′ by
sorting sub-images in ascending order according to E xMSE . The parameter TC is determined based on the
experience gained in previous experiments experiences. The step is used to remove any tampered area
whose size is larger than a sub-image and whose content is derived from raw images.
S6. Find the sub-image with the smallest MSE in U ′ (Eq. (14)), and take it as an initial seed.
S7. Compare each sub-image with the seed by computing the similarity of the estimated quantization table. To
evaluate the similarity of the estimated quantization tables of two sub-images, the MAE (mean absolute
⎛ ⎞
⎜ ⎟
EQMAE (Q(i1, j1) , Q(i 2, j 2 ) ) =
1 ⎜ ⎟,
⎜ ∑ q (i1, j1) (k , l ) − q(i 2 , j 2 ) (k , l ) ⎟
(15)
Ω c ( k ,l )∈Ωc
⎜ (i1, j1)∈U n1 ⎟
⎝ (i 2, j 2 )∈U n 2 ⎠
where Q(i1, j1) and Q(i1, j1) are the two estimated quantization tables in two sub-images U n1 and U n 2
15
respectively. Since the AC components in Type IV are labeled “un-estimated”, we only calculate the
difference between the AC components that can be estimated in both Q(i1, j1) and Q(i 2, j 2 ) .
S8. Choose each sub-image whose MAE (Eq. (15)) is less than 1 to form part of the seed region.
S9. Repeat steps S7−S8 until the seed region contains N S sub-images. If the number of sub-images in the seed
region is less than N S after testing each sub-image in U ′ , discard the sub-images from U ′ and repeat
steps S6−S9 until the seed region contains N S sub-images. The reason that the number of sub-image in the
seed region may be less than N S is that the tampered region may have the smallest MSE (Eq. (15)) in U ′ .
As mentioned in Section III.A, the basic unit for quantization table estimation is a sub-image, and the
performance of the procedure improves when several genuine regions are used. To select candidate regions, we
enlarge the initial candidate region (i.e., the seed region) by merging other sub-images that have similar
quantization tables. Meanwhile, we refine the candidate region by eliminating blocks whose quantization tables
In fact, a tampered region, whose shape is often arbitrary, may partially cover several sub-images. This
implies that if the tampered blocks of all the sub-images are grouped compactly, it should be possible to merge
more genuine blocks with the seed region to form the candidate region, i.e., increase the number of genuine sub-
images in the candidate region. Therefore, the region growing algorithm increases the number of genuine sub-
The basic operating unit of the region growing algorithm is one block, as shown in Figure 8. We also define a
set U ′′ , which contains the sub-images that were not tested in the seed region generation phase. The steps of the
16
Fig. 8. The region growing process.
G1. Select one sub-image from U ′′ and combine it with the seed region as the first temporary version.
G2. Randomly permute the 8×8 blocks in the first temporary version to generate N k temporary blocks.
G3. Divide each of the ( N k + 1 ) temporary versions into ( N S + 1) sub-images and perform quantization
table estimation for each of the ( N S + 1) sub-images. This yields (N k + 1)(N S + 1) estimated
G4. Compute the MAE (Eq. (15)) of the quantization table between the first temporary seed region and
η (MAE − 1)
G5. Calculate the ratio R1 = for ( N k + 1)( N S + 1) estimated quantization tables, where
(N k + 1)(N S + 1)
η (⋅) denotes the unit step function ( η ( x ) = 1 for x > 0 and η (x ) = 0 for x ≤ 0 ). If the ratio R1 is
less than a threshold TR (0.1), the first temporary version is adopted as the new seed region; otherwise,
a refinement operation is performed, as shown in Figure 9. First, we find the random image that
contains the sub-image with the largest MAE (i.e., the N p -th sub-image in the figure) and remove it.
Then, the remaining sub-images form a new seed region. It is expected that after random permutation
in Step G2, the sub-image with the largest MAE will cover most of the tampered region. This is why
17
we remove the sub-image with the largest MAE and let the other sub-images form a seed region.
G6. Repeat Steps G1−G5 until every sub-image in U ′′ has been tested.
Nk random images
N1
Np
Nk
After candidate region selection and quantization table estimation, we perform an inconsistency check to
identify tampered regions. The rationale behind the inconsistency check is that the estimated quantization table
in a candidate region will be different from that in suspect blocks. In other words, the inconsistency check
measures variations in the image content before quantization and after IQ based on the estimated quantization
table. To measure variations resulting from forgery, the mean absolute error (MAE) of the AC coefficients in the
⎡ x(i , j ) (k , l ) ⎤
E xMAE (x(i , j ) ) = ∑ x(i , j ) (k , l ) − q * (k , l ) ⋅ ⎢ * ⎥. (16)
0≤ k ,l ≤7 ⎣ q (k , l ) ⎦
( k ,l )≠(0.0 )
where ⋅ represents the absolute operator. To reduce the impact of an image’s content on forgery detection, we
MAE
(x( ) ) = (
E xMAE (x(i , j ) ) − min E xMAE (x(i , j ) ) )
E x i, j
( ) (
max E xMAE (x(i , j ) ) − min E xMAE (x(i , j ) ) ). (17)
If a block is genuine, we assume that a correct quantization table can be estimated, and that its E xMAE x(i , j ) will ( )
be smaller than that of a tampered block, i.e., we can determine whether an 8×8 block has been tampered with
( )
based on its E xMAE x(i , j ) . In fact, deciding whether an 8×8 block is a tampered block can be regarded as a two-
18
class classification problem. Therefore, to identify tampered blocks, we employ a classifier based on the
Figure 10 shows the distributions of the normalized MAEs of tampered and genuine images. Clearly, we can
( )
model each class as a Gaussian distribution. For simplicity, we replace E xMAE x(i , j ) with the symbol y ; then the
The probability distribution of measurements without/with tampering can be modeled as Gaussian functions:
1 ⎧ 1 2⎫
p( y |"0" ) = exp⎨− ⋅ ( y − µ0 ) ⎬ , (18)
(2π ) σ 0 ⎩ 2σ 0
1/ 2 2
⎭
1 ⎧ 1 2⎫
p( y |"1" ) = exp⎨− ⋅ ( y − µ1 ) ⎬ , (19)
(2π ) σ 1 ⎩ 2σ 1
1/ 2 2
⎭
where µ0 , µ1 , σ 02 , and σ 12 represent the respective means and variances of the two distributions. To assess
⎧ ⎛ P("0")(C10 − C00 ) ⎞
⎪ "1" if log L( y ) ≥ log⎜⎜ ⎟⎟
~ ⎪
Λ( y ) = ⎨ ⎝ P("1")(C01 − C11 ) ⎠ , (20)
⎪"0" ⎛ P("0")(C10 − C00 ) ⎞
if log L( y ) < log⎜⎜ ⎟⎟
⎪⎩ ⎝ P("1")(C01 − C11 ) ⎠
where Cij denotes the risk by determining “i” as “j”, P(“i”) is the i-class prior probability, and the logarithm of
p ( y |"1" ) 1 σ 1
log L( y ) = log = log 0 + ( y − µ0 )2 − 1 2 ( y − µ1 )2
p ( y |"0" ) 2 σ 1 2σ 0
2
2σ 1
⎛ 1 1 ⎞ y2 ⎛ µ µ ⎞ µ2 µ2 σ
= ⎜⎜ 2 − 2 ⎟⎟ + ⎜⎜ 12 − 02 ⎟⎟ y + 0 2 − 1 2 + log 0 . (21)
⎝ σ 0 σ1 ⎠ 2 ⎝ σ1 σ 0 ⎠ 2σ 0 2σ 1 σ1
Given symmetric classification costs (i.e., C00 = C11 , C01 = C10 ) and equal prior probabilities (i.e.,
~
P(“0”)=P(“1”)), the optimum Bayesian classifier Λ ( y ) can be described in the following form:
~ ⎧ "1" if log L( y ) ≥ 0
Λ( y ) = ⎨ . (22)
⎩"0" if log L( y ) < 0
19
Fig. 10. The probability of normalized MAE.
To evaluate the performance of the proposed forgery detection scheme, we select 1414 raw images of different
size (256×256, 512×512, 384×512, and 512×384 pixels) as well as some popular images, such as Lena, F-16,
and Baboon, and Peppers. The experiment is comprised of four parts. Since quantization table estimation is the
most important function of the proposed scheme, we evaluate the estimation performance first. We also compare
our scheme with the method presented in [18]. The second, third and fourth parts of the experiment evaluate the
performance of the pre-screening, candidate region selection, and tampered region identification processes. To
generate tampered images for the performance evaluation, we consider three ways of forging images: copy-paste
To evaluate the performance of quantization table estimation, we use two measurements, accuracy and MAE.
( )
The accuracy A Q, Q * of the true and estimated quantization tables is defined as follows:
( )
A Q, Q * =
1
Ωc
∑ δ (q (k , l ) − q(k , l )) .
*
(23)
( k ,l )∈Ωc
It is assumed that the higher the accuracy is, the better the performance of quantization table estimation will be.
( )
In addition, similar to Eq. (15), EQMAE Q, Q * is also used for the performance evaluation.
Twelve images, each comprised of 512×512 pixels, are selected for performance evaluation of quantization
table estimation. Each image is JPEG compressed and the QF is changed from 40 to 90. In total, there are 372
20
JPEG compressed images for evaluation. The parameters T1 and T2 are set to 0 and 100 respectively. The
results, listed in Table II, show that the average accuracy and MAE are 90.43% and 0.1346 respectively. Overall,
the results show that the proposed scheme can estimate the quantization step sizes well, regardless of the QF and
image content. In addition, the accuracy of the proposed scheme on Baboon and Earth is 99.64% and 81.21%
respectively. The results show that the scheme achieves a better performance on textured images. The reason is
that since more energy spread in the AC frequency band of a textured image, the difficulty of quantization table
estimation is lower.
( )
Measurement
Image
(
A Q, Q * ) EQMAE Q, Q *
We compare our scheme with the approach in [18] under JPEG compression with different QFs, as shown in
( )
Fig. 11. The experiment results demonstrate that the MAEs ( EQMAE Q, Q * ) of the proposed scheme are lower
than those reported in [18] for the two images when QF is changed from 60 to 90. Hence, the performance of the
B. Pre-screening Performance
Before evaluating the pre-screening performance, we introduce some performance indices, namely, true
positive (TP), true negative (TN), false positive (FP), and false negative (FN):
NP NN NF NM
TP = , TN = , FP = , FN = , (24)
NP + NM NN + NF NN + NF NP + NM
21
where N P , N N , N M , and N F are the numbers of correctly detected JPEG compressed images, the numbers of
correctly detected non-JPEG compressed images, missed detections, and false alarms, respectively; ( N P + N M )
is the total number of true JPEG compressed images; and ( N N + N F ) is the total number of true non-JPEG
compressed images. It is expected that the pre-screening performance will be good when TP and TN are high
simultaneously.
(a) (b)
Fig. 11. Comparison of quantization table estimation: (a) Lena, (b) F16.
A receiver operating characteristics (ROC) graph is a common technique for visualizing and evaluating the
performance of a classifier [10]. Basically, a classifier’s performance is considered good if its ROC curve
approaches to an upper left-hand direction, i.e., TP rate is higher and FP rate is lower simultaneously. Since the
pre-screening function in the proposed scheme is similar to a classifier, we use it to filter out uncompressed
images. Therefore, 1414 images are JPEG compressed with random QFs, which yields a test set containing 2828
images (1414 non-JPEG compressed; 1414 JPEG compressed) for the performance evaluation. The trend of the
ROC curve for the pre-screening phase, shown in Fig. 12, is in the upper left-hand direction. For total images,
TP and TN of the prescreening classifier with a pre-defined threshold (18) are 100% and 97.24%, respectively.
The results show that the pre-screening function can easily determine if an input image has been JPEG-
compressed; therefore, the function improves the efficiency of the proposed forgery detection scheme.
22
Fig. 12. ROC curve of the pre-screening process.
To evaluate the performance of candidate region selection, we create two fake images containing tampered
regions, each measuring 100×100 pixels, as shown in Fig. 13(b). The tampered regions are marked by red lines.
Figure 13(c) shows the results of seed region generation when N S is equal to 4. As we can see in Figs. 13(b),
the tampered region covers sub-images 11, 12, 15, and 16; hence, those sub-images are not included in the seed
region, and they are chosen during candidate region selection. The result, shown in Figure 13(d), demonstrates
that the proposed candidate region selection process can find genuine regions for quantization table estimation
effectively.
We also analyze the impact of candidate region selection on quantization estimation when an image has been
tampered with. The Lena image is used as a base to generate several JPEG-decoded versions. Similar to the
approach in [21], we insert different-sized tampered regions captured from raw data or JPEG compressed images
with different QFs. The QF’s in the tampered regions are 50, 60, 70, and 80, and the size of the regions ranges
from 64×64 to 200×200. For each tampered image, quantization table estimation with/without candidate region
( )
selection (CRS) is performed 20 times and the MAE ( EQMAE Q, Q * ) is measured. Table III shows the
experiment results. The average MAE of quantization table estimation with CRS is lower than that without CRS.
We observe that the average MAE can be reduced from 14.75 to 0.77 when QF is 60. The results demonstrate
that using CRS improves the estimation performance of the proposed scheme. In addition, as shown in Table III,
23
the average MAEs for raw data are higher than those for JPEG data. This means that the phenomenon of
multiple peaks after JEPG compression is more pronounced when raw data is used. The reason is that the
estimated quantization step sizes are all 1 and the MSE in Eq. (14) is very small when the content of the
tampered sub-image is derived from a raw image. Since the tampered sub-images may be chosen during seed
region generation for quantization table estimation, the number of q (k , l ) = 1 is adopted as a feature in the
generation phase. The results show that the performance of quantization table estimation can be improved by
using CRS based on these two features, i.e., the inconsistency in the quantization table and the number of
q (k , l ) = 1 .
Since copy-paste tampering [6],[12] is a common forgery method, we use it to evaluate the proposed scheme’s
forgery detection capability. Block mis-matching while generating the tampered image might affect the
performance of forgery detection, so we also discuss its impact here. Figure 14 shows two Baboon images
altered by copy-paste tampering. Similar to [22], the tampered regions can be copied from the original raw image.
Figures 14(a) and 14(b) are generated under block mis-matching and block matching, respectively. The visual
24
quality of both images is good. Figures 14(c) and 14(d) show the results of forgery detection. The white lines in
the four images indicate the tampered regions that the proposed scheme identified.
In fact, tampered regions may be copied from other JPEG images [21],[22]. It is expected that the QF used in
the tampered region may affect the performance of forgery detection. Next, we evaluate the performance of our
forgery detection scheme on tampered regions with different QFs. The results are shown in Figure 15. The input
image is 75; and the QFs of the tampered regions in Figs. 15 (a1)−(a4) are 65, 70, 80, and 85, respectively. The
PSNR of the input image to that of each modified version is 49.70dB, 51.74dB, 51.01dB, and 49.53dB,
respectively. The results show that it is difficult for the human eye to identify tampered regions. Figures 15
(b1)−(b4) show the tampered regions (non-black blocks), which were correctly located by the proposed scheme.
The results demonstrate that the scheme can identify tampered regions whose content results from both raw and
JPEG images.
D.2. Inpainting
The goal of inpainting is to extract structure and texture information of the known or existing regions for
reconstructing an unknown or lost area. Though inpainting can be utilized to reconstruct missing or degraded
parts of images and videos, it may be abused to alter image content for generating a faked image [22]. Here we
use some inpaiting methods ([7],[8] (inpainting software), and image processing tool “PhotoShop”) to generate
tested images containing tampered regions with different sizes for the performance evaluation. First, tampered
regions are generated manually by using an image processing tool. The images are shown in the first row of Fig.
16. In Fig. 16(b1), the size of the tampered region is about 95×70 pixels. Fig. 16(c1) illustrates the result of
forgery detection. The second and third rows of the figure show the detection results derived by the approaches
25
in [7] and [8] respectively. In Figs. 16(b2) and 16(b3), the sizes of the tampered regions are 88×88 pixels and
41×153 pixels respectively. These two figures show that it is difficult for the human eye to determine whether
the image is a fake. The detected regions are shown in Figs. 16(c2) and 16(c3). As we can see in Fig. 16, the
tampered regions can be identified by using the proposed scheme. The results in Fig. 16 demonstrate that the
proposed scheme performs well, even if the tampered regions are generated by inpainting and an image
In composite tampering, a composite image is created by using an inpainting method and a copy-paste process.
Figures 17(a) and 17(b) show the original images. We remove the squirrel from Fig. 17(a) by using the
inpainting tool described in [8] and then copy-and-paste the other one extracted from Fig. 17(b) to generate a
tampered image. The tampered region covers about 20% of the image in Fig. 17(a), and the fake image is shown
in Fig. 17(c). It is difficult to determine whether the image in Fig. 17(c) is a fake. As shown in Fig. 17(d), the
two tampered areas are localized well. The detection result demonstrates the proposed scheme also performs well
26
(a1) (b1) (c1)
(a) (b)
(c) (d)
Fig. 17. Composite tampering: (a),(b) original images, (c) tampered version, and (d) forgery detection.
27
D.4. Evaluation of the MLR classifier
To evaluate the overall performance of the proposed forgery detection scheme, we randomly generate 1240 fake
images in which the length of the tampered square regions at one side ranges from 32 to 200 pixels. Based on the
proposed MLR classifier, the TP and TN are 96.92% and 95.55%, respectively. The results show that the MLR
classifier in our scheme can detect tampered regions effectively, irrespective of their size.
V. CONCLUSION
In this paper, we have proposed a passive-blind scheme for detecting forged images. Since quantization is an
important process and checking the inconsistency of quantization tables can be used for tampering detection,
quantization table estimation is a key component of the proposed scheme. To improve the accuracy of
quantization table estimation, we developed a content-adaptive algorithm that first classifies the AC DCT
coefficients into different types. Then, the quantization step size to be used is measured adaptively from the PSD
and the PSD’s Fourier transform. The proposed scheme is comprised of three phases: pre-screening, candidate
region selection, and tampered region identification. To determine whether an input image has been JPEG
compressed, we count the number of times the quantization step size is equal to 1, and then filter out images
without the JPEG compression feature during the pre-screening phase. For quantization table estimation during
the candidate region selection phase, we devise an algorithm for seed region generation and region growing. The
seed region generation step finds a suitable seed region by removing suspect regions. Then, based on the seed
region, the candidate region is formed by merging other genuine regions with the seed. To avoid merging
suspect regions, a candidate region refinement operation is included in the region growing step. After estimating
the quantization table from the candidate region, an MLR classifier based on the inconsistency of the
To evaluate the proposed scheme, a large number of images with different QFs are generated. The average
MAE for quantization table estimation in the proposed scheme is small. The experiment results show that the
scheme can estimate quantization tables with a high degree of accuracy. In terms of candidate region selection,
the results demonstrate that genuine blocks can be grouped together for quantization table estimation. In addition,
we use three common types of forgery, namely, copy-paste tampering, inpainting, and composite tampering, to
28
generate fake images for the performance evaluation The experiment results manifest that the proposed forgery
detection scheme can identify tampered regions effectively, regardless of whether the content is derived from
REFERENCES
[1] C.-S. Lu and H.-Y. Mark Liao, “Multipurpose Watermarking for Image Authentication and Protection,”
IEEE Trans. Image Processing, vol. 10, no. 10, pp. 1579−1592, 2001.
[2] W.-N. Lie, G.-S. Lin, and S.-L. Cheng, “Dual protection of JPEG images based on informed embedding
and two-stage watermark extraction techniques,” IEEE Trans. Information Forensics and Security, vol. 1,
no. 3, pp.330−341, 2006.
[3] Y. L. Chen and C.-T. Hsu, “Image tampering detection by blocking periodicity analysis in JPEG
compressed images,” in Proc IEEE 10th Workshop on Multimedia Signal Processing, pp. 803−808, 2008.
[4] C.-C. Hsu, T.-Y. Hung, C.-W. Lin, and C.-T. Hsu, “Video forgery detection using correlation of noise
residue,” in Proc. IEEE International Workshop on Multimedia Signal Processing, pp. 170−174, 2007.
[5] W. Luo, Z. Qu, J. Huang, and G. Qiu, “A novel method for detection cropped and recompressed image
block,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. II-
217−II-220, 2007.
[6] T. V. Lanh, K.-S. Chong, S. Emmanuel, and M. S Kankanhalli, “A survey on digital camera image forensic
methods,” in Proc. IEEE International Conference on Multimedia & Expo, pp. 16−19, 2007.
[7] Jin-Bing Huang, Edge point detection and texture analysis for image inpainting, Master Thesis, National
Chung Hsing University, Taiwan, 2006.
[8] Teorex, http://www.teorex.com/inpaint.html
[9] J. Fridrich, M. Goljan, and R. Du, “Steganalysis based on JPEG compatibility,” in Proc. SPIE Multimedia
Systems and Applications IV, pp.275−280, 2001.
[10] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, Wiley-Interscience, 2001.
[11] H. V. Poor, An introduction to Signal Detection and Estimation, Springer, 1994.
[12] Y.-F. Hsu and S.-F. Chang, “Image splicing detection using camera response function consistency and
automatic segmentation,” in Proc. IEEE Conf. Multimedia Expo., pp. 28−31, July 2007.
[13] A.C. Popescu and H. Farid, “Exposing digital forgeries in color filter array interpolated images,” IEEE
Trans. Signal Processing, vol. 53, no.10, pp. 3948−3959, Oct. 2005.
[14] S. Bayram, H. T. Sencar, and N. Memon, “Source camera identification based on CFA interpolation,” in
Proc. IEEE Int. Conf. Image Processing, vol. 3, pp. III-69-72, 2006.
[15] J. Lukáš, J. Fridrich, and M. Goljan, “Digital camera identification from sensor pattern noise,” IEEE Trans.
Information Forensics Security, vol. 1, no. 2, pp. 205−214, 2006.
29
[16] M. Chen, J. Fridrich, and J. Lukáš, “Determining image origin and integrity using sensor pattern noise,”
IEEE Trans. Information Forensics Security, vol. 3, no. 1, pp. 74−90, 2008.
[17] A. Swaminathan, M. Wu, and K. J. R. Liu, “Digital image forensics via intrinsic fingerprints,” IEEE Trans.
Information Forensics Security, vol. 3, no. 1, pp. 101−117, 2008.
[18] S. Ye, Q. Sun and E.-C. Chang, “Detecting digital image forgeries by measuring inconsistencies of
blocking artifact,” in Proc. IEEE International Conference on Multimedia & Expo, pp. 12−15, 2007.
[19] I. Avcibas, S. Bayram, N. Memon, B. Sankur, and M. Ramkumar, “A Classifier Design for Detecting
Image Manipulations,” in Proc. IEEE Int. Conf. Image Processing, vol. 4, pp. 24−27, 2004.
[20] H. Farid, “A Survey of Image Forgery Detection,” IEEE Signal Processing Magazine, vol. 26, no. 2, pp.
16−25,2009.
[21] H. Farid, “Exposing digital forgeries from JPEG ghosts,” IEEE Trans. Information Forensics and Security,
vol. 4, no. 1, pp.154−160, 2009.
[22] W. Li, Y. Yuan, and N. Yu, “Passive detection of doctored JPEG image via block artifact grid extraction,”
Signal Processing, vol. 89, pp.1821−1829, 2009.
[23] S. Bayram, I. Avcibas, B. Sankur, N. Memon, “Image manipulation detection,” Journal of Electronic
Imaging, vol. 15, pp.041101-1−041101-17, 2006.
30