Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Signal Processing: Image Communication 63 (2018) 149–160

Contents lists available at ScienceDirect

Signal Processing: Image Communication


journal homepage: www.elsevier.com/locate/image

A novel contrast enhancement forensics based on convolutional neural


networks✩
Jee-Young Sun, Seung-Wook Kim, Sang-Won Lee, Sung-Jea Ko *
Department of Electrical Engineering, Korea University, Anam-ro 145, Seongbuk-gu, Seoul, 02481, Republic of Korea

a r t i c l e i n f o a b s t r a c t

Keywords: Contrast enhancement (CE), one of the most popular digital image retouching technologies, is frequently utilized
Digital image forensics for malicious purposes. As a consequence, verifying the authenticity of digital images in CE forensics has recently
Contrast enhancement drawn significant attention. Current CE forensic methods can be performed using relatively simple handcrafted
Convolutional neural networks features based on first-and second-order statistics, but these methods have encountered difficulties in detecting
Deep learning
modern counter-forensic attacks. In this paper, we present a novel CE forensic method based on convolutional
Gray level co-occurrence matrix
neural network (CNN). To the best of our knowledge, this is the first work that applies CNN to CE forensics.
Unlike the conventional CNN in other research fields that generally accepts the original image as its input, in the
proposed method, we feed the CNN with the gray-level co-occurrence matrix (GLCM) which contains traceable
features for CE forensics, and is always of the same size, even for input images of different resolutions. By
learning the hierarchical feature representations and optimizing the classification results, the proposed CNN can
extract a variety of appropriate features to detect the manipulation. The performance of the proposed method
is compared to that of three conventional forensic methods. The comparative evaluation is conducted within a
dataset consisting of unaltered images, contrast-enhanced images, and counter-forensically attacked images. The
experimental results indicate that the proposed method outperforms conventional forensic methods in terms of
forgery-detection accuracy, especially in dealing with counter-forensic attacks.

1. Introduction into the processing history of an image [5]. Furthermore, since CE


is frequently employed to disguise the evidence of image tampering,
As image and video editing techniques rapidly develop, image detecting such a manipulation can provide useful prior information
manipulation has become an easy process that can be exploited for in the identification of content-changing operations. Thus, this paper
malicious purposes, such as copyright infringement, and spreading false focus on the development of a forensic method of detecting the CE
information in the news media or litigation. In recent years, various manipulation.
digital forensic methods have been proposed to verify the authenticity of Several CE forensic approaches [5–7] have been proposed in recent
multimedia data. Digital image forensics identifies traceable statistical
years. Earlier CE forensic methods [5,6] utilized the manipulation
artifacts left behind after an image alteration and distinguishes forgeries
traces that can be observed in the 1D grayscale histogram of images.
from unaltered images. In general, image manipulation leaves unique
These simple 1D histogram-based evidences, however, are easy to be
fingerprints on images; thus, most digital image forensic methods focus
concealed after a further processing, which may render current CE
on detecting different types of image manipulations, which are broadly
divided into two categories: (1) content-preserving operations including forensic methods unreliable. Techniques introduced for this purpose are
resampling [1], compression [2], median filtering [3,4], and contrast referred to as counter-forensics, or anti-forensics. Some anti-forensic
enhancement (CE) [5–7]; and (2) content-changing operations, such as techniques [11,12] fool the 1D histogram-based CE forensic meth-
splicing and copy-move manipulation [8–10]. Although the content- ods [5,6] by removing the visual clues in the 1D histogram through
preserving operations may not pertain to malicious image tampering, adding a pixel-wise random noise to the image. To address such anti-
detecting these alteration is still forensically significant. Especially, the forensic methods, De Rosa et al. [7] proposed a CE forensic algorithm
detection of the globally applied CE manipulation can provide insight to classify the unaltered, contrast-enhanced, and counter-forensically


This paper has supplementary downloadable material available at http://doi.org/10.6084/m9.figshare.5160982.v1, provided by the authors. This includes .mat
format dataset and .py format source codes for training and testing the proposed CNN for contrast enhancement forensics.
* Corresponding author.
E-mail addresses: jysun@dali.korea.ac.kr (J.-Y. Sun), swkim@dali.korea.ac.kr (S.-W. Kim), swlee@dali.korea.ac.kr (S.-W. Lee), sjko@korea.ac.kr (S.-J. Ko).

https://doi.org/10.1016/j.image.2018.02.001
Received 27 September 2017; Received in revised form 11 January 2018; Accepted 1 February 2018
Available online 12 February 2018
0923-5965/© 2018 Elsevier B.V. All rights reserved.
J.-Y. Sun et al. Signal Processing: Image Communication 63 (2018) 149–160

attacked images employing a 3-class support vector machine (SVM) only the common CE manipulations but also the recent counter-forensic
training scheme. Nevertheless, a recently developed optimization-based attacks. Moreover, the proposed method outperforms the conventional
anti-forensic attack [13] performs CE without a significant distortion ones in terms of forgery detection accuracy. The remainder of this paper
of the first-and second-order statistics, which effectively deceives the is organized as follows. Section 2 briefly explains the previous works on
conventional CE forensic methods. The aforementioned conventional CE forensics and deep-learning based digital image forensics. Section 3
CE forensic methods [5–7] utilize handcrafted features based on the describes the proposed CNN-based CE forensic method. Experimental
observed visual clues and have separate feature extraction and classi- settings and results are provided in Sections 4 and 5, respectively, while
fication stages that cannot be simultaneously optimized in an iterative Section 6 concludes the paper.
scheme. As a result, most conventional methods exhibit unsatisfactory
performance when detecting images that are manipulated by counter- 2. Related works
forensic attacks, even though they can detect commonly contrast-
enhanced images fairly well. 2.1. Contrast enhancement forensic approaches
To cope with not only common CE manipulations but also counter-
forensic attacks, we propose a novel CE forensic method based on a Several approaches [5–7] have been proposed to identify whether
deep learning framework. Convolutional neural network (CNN) [14] a digital image has been processed by CE or not. Earlier CE forensic
is a deep multi-layer neural network that is motivated by how human methods [5,6] observed the differences in the 1D grayscale histogram of
brains process visual information. It learns feature representations by the unaltered image and that of the manipulated one. Stamm and Liu [5]
using convolution kernels and automatically fulfills the classification. maintained that the 1D histogram of a pure image has a smooth envelope
Then, the classification result is utilized to guide the feature extraction
in general, whereas that of a contrast-enhanced image has a ragged
through a back-propagation algorithm. CNNs have shown impressive
envelope that contains peaks and valleys. In this method, the Fourier
performance in modern artificial intelligence (AI) tasks that include
transform of the 1D histogram is performed and the high frequency
object detection and segmentation [15], and recently, the CNN has been
energy F is calculated. The target image is identified as a contrast-
applied to a number of approaches [16–22] regarding digital image
enhanced one when F is larger than a certain threshold value. Cao
forensics. These approaches focus on the detection of specific image
et al. [6] introduced a slightly different observation on the 1D histogram
manipulation techniques such as median filtering [16,17], Gaussian
of the contrast-enhanced image. Since the histogram peaks can occur
blurring [17], JPEG compression [18–20], and some content-changing
by other image processing techniques such as JPEG compression, they
operations [21,22]. In these methods, an input image itself or a specific
considered only the zero-height gap in the 1D histogram as the trace
form of input is fed into the CNN structure, and the appropriate features
of CE. In this method, the number of gap bins N in the 1D histogram
are extracted through an iterative training scheme. In the proposed
is counted; then if N is larger than a certain threshold, the image
method of the present study, the GLCM is utilized as the input to the
is classified as a contrast-enhanced one. These simple 1D histogram-
CNN structure.
based features, however, are easy to be removed by processing with the
From the perspective of CE forensics, input images of different
counter-forensic techniques.
resolutions need to be resized to have a fixed input resolution of the
To address this, De Rosa et al. [7] tried to classify the images into the
CNN structure. The image resizing process, however, damages the pixel
unaltered, contrast-enhanced, and counter-forensically attacked images
intensity information, and this may be harmful for CE forensics in the
by introducing a SVM training-based CE forensic algorithm that uses
extraction of traceable features. Image cropping does not change the
GLCM. In this method, a 3-class SVM is utilized for the classification,
pixel intensity, but the one-by-one checking of the cropped patches of
and the feature space for SVM training is a 1D histogram calculated by
the input image by the CNN could be a cumbersome process. Moreover,
accumulating the variance value of each row of the GLCM. Although
the features that are trained with the input image itself are more related
this work is the first attempt to apply second-order statistics to CE
to the image content rather than the manipulation evidence. Therefore,
forensics, it is difficult to say that the 2D information was well-utilized
in the proposed method, instead of resizing or cropping the input image,
because the GLCM was only used to generate another simple 1D feature.
the GLCM of the input image is computed and fed into the CNN. The
The aforementioned CE forensic methods [5–7] are based on manually
GLCM is obtained by accumulating the occurrence of the pixel intensity
selected features and have separate stages for feature extraction and
pairs between each pixel and its neighboring pixels. Besides, the GLCM
can always be obtained with the same size, even if the input images have classification, thereby yielding unsatisfactory performance in detection
different resolutions. A detailed explanation on the GLCM is provided of the images manipulated of counter-forensic attacks.
in Section 3.1.
In the proposed CNN-based CE forensic method, instead of construct- 2.2. Deep-learning based digital image forensics
ing better-handcrafted features, a wide range of global and local features
that can distinguish between the GLCMs of unaltered images and Although deep learning based CE forensics has not been studied, a
those of the forgeries are automatically learned during optimizing the number of attempts wherein the deep learning models were applied
classification results. To the best of our knowledge, this work is the first have been made to address the other manipulations in digital image
CE forensic attempt that employs a CNN and uses the GLCM as the input forensics. One of the first works that employed CNNs for image forensics
to the CNN. The trained features from the GLCM are more powerful than is [16]. In this paper, the authors developed a CNN-based median
the previous handcrafted features and are even more accurate than the filtered image detector. To better expose the manipulation traces, a
features that are trained using the image itself. To verify the effective- pre-processing step was added in this case to obtain the median filter
ness of the GLCM, we compared the forgery detection performance of the residual, which is defined as the difference between the input image
deep learning-based CE forensic models, which are trained using three and its median filtered output. By feeding the residual into the CNN,
different types of inputs (i.e., resized input image, randomly cropped the detector could extract the forgery-related features and achieved a
input image, and the GLCM of an input image). Then, the performance performance that is superior to the one that is trained by the direct use
of the proposed method was compared with that of three state-of- of an image as the input. In [17], instead of the preprocessing step, Bayer
the-art CE forensic methods. This comparative evaluation is conducted and Stamm proposed a constrained convolutional layer that enables the
within the dataset consisting of unaltered images, three different types CNN to focus on the local relationships among the pixels; the layer is
of images processed by common CE technologies, and images attacked constrained to suppress the image content so that the content-dependent
by using three state-of-the-art counter-forensic approaches [11–13]. features cannot be learned. This method can deal not only with median
The experimental results show that the proposed method handles not filtering, but also with other filter-based alterations such as Gaussian

150
J.-Y. Sun et al. Signal Processing: Image Communication 63 (2018) 149–160

blurring, the addition of additive white Gaussian noise (AWGN), and as many peaks and zeros as that of the original-sized and contrast-
the bilinear interpolation. enhanced image in Fig. 2(b). In Table 1, the two features employed
Wang et al. [18] proposed a CNN-based double JPEG compression in the conventional CE forensic methods [5,6] (i.e., the high frequency
detector that exploits a 1D feature vector, a histogram consisting of 99 energy metric F and the number of zero-gap bins N ) are provided to
bins, as the CNN input. They first generated nine histograms of the see how the resizing process affects the traceable features. In the case of
DCT coefficients that had been arranged in a zigzag order at the first the original-sized images, the differences between the unaltered image
nine alternating current (AC) sub-bands. For each sub-band, only the and the manipulated one are large enough for both features, and this
coefficient values in the range of {−5, −4, −3, −2, −1, 0, 1, 2, 3, 4, 5} are allows to distinguish the forgery from the pure images. However, the
considered to be accumulated to a histogram. So the concatenation of conspicuous differences disappear after resizing, which indicates that
the nine histograms with 11 bins resulted in the 1D histogram with 99 the image resizing negatively affects the manipulation traces for CE
bins. In [19], a multi-domain CNN is introduced to distinguish between forensics.
uncompressed, single-compressed, and double-compressed images. The Although F and N of the cropped images are still reliable to identify
multi-domain CNN consists of the spatial-domain CNN that receives the authenticity as shown in Table 1, image cropping could be a
the input as a three-channel color image and the frequency-domain cumbersome process in a real world scenario, because the input images
CNN that accepts the 1D histogram of the DCT coefficients as the are cropped into several patches and the cropped patches should be
input. Two feature vectors that are obtained from the two CNNs are repeatedly fed into the CNN one-by-one to detect the manipulation.
combined, thereby making the final decision on the JPEG compression. Furthermore, features that are extracted from original image are more
When the CNN models are trained with the DCT coefficients, these two related to image content rather than the trace of manipulation. Thus,
methods use 1D convolutional CNN whose input is not a 2D image. Barni instead of the images, we utilize GLCMs that are more forensically
et al. [20] proposed a 2D convolutional CNN-based JPEG compression informative as the CNN input in the proposed method.
detector that can accept the DCT images as the CNN input. They also GLCM represents the 2D distribution of the pixel intensity pairs
performed an experiment to compare the effects of three different inputs consisting of the intensity values of a pixel and its neighboring pixels.
(i.e., mean subtracted image, noise residual image, and DCT feature Given a grayscale image I of size 𝑀 × 𝑁 and an offset (𝛥𝑥, 𝛥𝑦) , GLCM is
image) for both aligned and non-aligned double JPEG compressed obtained by accumulating the occurrences of a gray-value pair (𝑖, 𝑗) of
images. two pixels that are separated by the offset. Ignoring the computation of
Some approaches that deal with content-changing alterations have image boundaries, GLCM is defined as:
been introduced as well. Bondi et al. [21] presented a CNN for the detec- ∑ 𝑀 ∑
∑ 𝑁
tion of the splicing region by performing a camera model identification. 𝐺𝐿𝐶𝑀 (𝑖, 𝑗) = I (𝐼 (𝑝, 𝑞) = 𝑖 ∧ 𝐼 (𝑝 + 𝛥𝑥, 𝑞 + 𝛥𝑦) = 𝑗) , (1)
In this work, the forgery is defined as the image that is created by (𝛥𝑥,𝛥𝑦)∈𝑶 𝑝=1 𝑞=1
pasting one or more patches onto a pristine image, and it is assumed where I (⋅) is an indicator function that returns 1 if the condition
that the patches and the image have been taken from different camera in the parentheses is satisfied, and 0 otherwise. GLCM offers the
models. The input image is first divided into several non-overlapping following three advantages: (1) GLCM is obtained by using pixel
patches, and each patch is fed into the CNN to be classified into its intensities of an input image, without any changes to the intensity
corresponding camera model. Then, a clustering algorithm is employed values. (2) GLCM represents the relationships between the intensities
to estimate the splicing region by using the patch-wise classified labels of each pixel and other pixels in its local neighborhood, which can
and the confidence values that are obtained from the CNN. Finally, a be a useful information for CE forensics. (3) GLCM can always be
CNN model which performs a two-stage classification for the detection obtained in the same size, even for the input images of different
of counter-forensic source anonymization attacks is proposed in [22]. resolutions. We first compute the GLCMs with eight different offsets,
At the first stage, the CNN model determines whether the image is 𝑂 = {(0, 1), (0, −1), (1, 0), (−1, 0), (1, 1), (1, −1), (−1, 1), (−1, −1)}, which
authentic or counter-forensically modified. Then, the model fulfills represent the relations between the center pixel and the pixels in its
a multi-class classification to identify the specific attack among the 8-directional neighborhood, and then element-wise sum of the eight
seam carving, fingerprint copying, and adaptive photo response non- GLCMs results in one accumulated GLCM. In our method, the quanti-
uniformity (PRNU) denoising. zation level of GLCM is set to 256, whereby the pixel intensity value
of the center pixel 𝑖 and that of its neighboring pixel 𝑗 in (1) are in
3. Materials and methods the range of [0, 255], and therefore, the size of the GLCM is fixed to
256 × 256. Then, the accumulated 1-channel GLCM is fed into the CNN
In this section, we first comment on the GLCM, including the problem instead of the input image. To verify the effectiveness of using GLCM, we
of using the original image as the CNN input and how the GLCM can compared the performance of deep learning-based CE forensic models
deal with such a problem. After describing the acquisition of the GLCM, that are trained using three different types of inputs: the resized image,
the next subsection further explains the architecture of the proposed the cropped image, and the GLCM of an image. The corresponding
network and the loss function. experimental results will be discussed in Section 5.1.
Fig. 3 shows the GLCMs for the unaltered, contrast-enhanced, and
3.1. Gray level co-occurrence matrix counter-forensic-attack images. The GLCMs for the contrast-enhanced
images in Fig. 3(b)–(d) have conspicuous traces, including the transi-
Images of different resolutions should be resized to a specific size to tions from/to peaks and zeros, constructing grid-like patterns. On the
be fed into the CNN because it only accepts a fixed-sized input. Image other hand, there is no noticeable difference in appearance between
resizing and cropping are the common techniques to resize an image. the GLCM of the pure image in Fig. 3(a) and the GLCMs of the
However, when the image resolution is changed by using an image images after anti-forensic techniques which are shown in Fig. 3(e)–(g).
resizing technique, the pixel intensity may change slightly, which can Thus, conventional handcrafted features which are selected by human
damage manipulation traces for CE forensics. For example, an unaltered observation are not sufficient to distinguish unaltered images from anti-
image in Fig. 1(a) and its contrast-enhanced image in Fig. 1(b) are forensic-attack images. In the proposed CNN-based CE forensic method,
resized by the same scale using bicubic interpolation, resulting in two the CNN first fulfills classification with the use of initial features,
outputs as shown in Fig. 1(c) and (d), respectively. Fig. 1(e) and (f) and the classification loss is back-propagated to update the weights
illustrate the randomly cropped images of Fig. 1(a) and (b), and their of convolution kernels. Then, the updated convolution kernels extract
corresponding 1D histograms are depicted in Fig. 2. The histogram of new filter responses (i.e., local or global features of the GLCM for CE
the resized and contrast-enhanced image in Fig. 2(d) no longer has forensics, and these features are utilized to perform the classification

151
J.-Y. Sun et al. Signal Processing: Image Communication 63 (2018) 149–160

Fig. 1. Original-sized images, corresponding resized images, and corresponding randomly cropped images. (a) Unaltered image, (b) contrast-enhanced image after
gamma correction is applied, (c) resized image of (a), (d) resized image of (b), (e) cropped image of (a), and (f) cropped image of (b).

Fig. 2. 1D histograms of the images in Fig. 1. (a) Histogram of Fig. 1(a), (b) histogram of Fig. 1(b), (c) histogram of Fig. 1(c), (d) histogram of Fig. 1(d), (e) histogram
of Fig. 1(e), and (f) histogram of Fig. 1(f).

Fig. 3. Gray level co-occurrence matrix of (a) unaltered image, contrast-enhanced images after (b) histogram stretching (HS), (c) gamma correction (GC), and
(d) S-curve mapping (SM), and images manipulated by the following counter-forensic attacks: (e) Gaussian dithering (GD) [11], (f) internal bit depth increasing
(IBD) [12], and (g) total variation optimization (TVO) [13].

Table 1
Comparison of CE forensic features before and after image resizing and cropping.
Feature Original-sized Resized Cropped
Unaltered Enhanced Unaltered Enhanced Unaltered Enhanced
𝐹a 0.39 365.11 0.41 0.23 0.26 108.15
𝑁a 1 22 1 1 1 22
a
Features for CE forensics; F = high frequency energy and N = number of gap bins.

152
J.-Y. Sun et al. Signal Processing: Image Communication 63 (2018) 149–160

at the next iteration. By repeating these classification and updating of 4. Experimental setup
the kernel parameters, the proposed CNN can learn further features that
are hard to be visible to human eyes from the GLCMs. The proposed Two experiments were conducted to evaluate the performance of
CE forensic method, therefore, achieves reliable performance to detect the proposed CNN-based CE forensic method. First, a performance
both the commonly contrast-enhanced images and those processed by comparison was conducted among three deep neural network models for
counter-forensic attacks. CE forensics that utilize the following different types of inputs: resized
Notably, most of the information is distributed along the diagonal images, cropped images, and the GLCM of the images. In the second
line of the GLCM, as shown in Fig. 3, because the intensity values experiment, the forgery detection performance of the proposed method
of adjacent pixels tend to be similar. Inspired by this observation, we is compared with that of three state-of-the-art CE forensic methods. In
additionally trained the proposed CNN using the partial GLCM that this section, training, validation, and test dataset are first explained, and
is cropped along the diagonal direction, and the performance will be then the experimental cases and comparison targets are introduced. The
described in Section 5.2. implementation details including the parameter settings and training
strategies are described in the last subsection.
3.2. CNN architectures for CE forensics
4.1. Dataset
The proposed CNN architecture is presented in Fig. 4. The grayscale
GLCM image with a size of 256 × 256 is fed into the network, and then
To evaluate the CE detection performance of the proposed method,
low-level features are extracted using convolutional kernels. The first
we generated a dataset by using 5000 randomly chosen images from
convolutional layer convolves the GLCM with 50 kernels of size 7 × 7.
The size of the output is 25 × 256 × 50, which means that the number of MS COCO dataset [25], which is frequently used for object detection
feature maps is 50 and the resolution of each feature map is 256 × 256. and segmentation purposes. After converting the images to the YUV
Following every convolution layer, rectified linear unit (ReLU) is used color space, the Y -channels of the images were assigned as the unaltered
as the element-wise non-linear activation function in our work. After image data. Then, each of the 5000 Y -channel images was manipulated
ReLU activation, a max pooling operation is performed with a window by three popular CE techniques, histogram stretching (HS), gamma
size of 2 × 2, and a stride of 2, leading to the same number of feature correction (GC), and S-curve mapping (SM), resulting in a total of 15,000
maps with a decreased spatial resolution, 128 × 128 ×50. The feature contrast-enhanced images. In HS, the input pixel values were linearly
maps become inputs for the next convolutional layer, and the same mapped, such that 1% of the total pixels were saturated at an intensity
operations (i.e., convolution, activation, and max pooling) are repeated. value of 0, and another 1% at an intensity value of 255. In the case of GC,
Then, an output with a size of 32 ×32 × 50 passes through the two fully we split the 5000 images into four groups of 1250 images and applied
connected layers with the length of 100 and it is fed into the last layer the GC with different gamma values (𝛾 = 0.5, 0.8, 1.2, 1.5) for each
of the CNN, where the softmax function is used for classification. At group. Also, three recently-proposed counter-forensic attacks, Gaussian
the end of every fully connected layer, Batch normalization (BN) [23] dithering (GD) [11], internal bit-depth increasing (IBD) [12], and total
and ReLU activation are applied. Since BN can act as a regularizer, the variation optimization (TVO) [13] were applied to the Y -channel images
dropout layer and other regularization terms are not used in this work. to generate the counter-forensic-attack images. For the entire seven
The softmax function, which returns the class probability, is defined as types of images, the GLCMs were computed and used as inputs for the
follows: proposed network. In total, we obtained 35,000 GLCM data acquired
( ) 𝑒𝑧𝑗 from 5000 unaltered, 15,000 enhanced, and 15,000 attacked images.
𝜎 𝑧𝑗 = ∑𝐾 for 𝑗 = 1, … , 𝐾, (2)
𝑧𝑘 For each type of GLCM data, we randomly selected 80% for training
𝑘=1 𝑒
and validation and assigned the remaining 20% as the test dataset.
where K is the number of classes, z is the vector consisting of K
output values
( ) of the last layer, and the softmax probability of the 𝑗th 4.2. Experimental cases and comparison targets
class, 𝜎 𝑧𝑗 , is in the range of [0, 1]. The proposed CNN is trained
for both two-class and three-class classification. The two-class (𝐾 =
Regarding the first experiment, the three different types of CNN
2) classification detects manipulated images from unaltered images,
inputs, the resized images, cropped images, and the GLCM of the images,
whereas the three-class (𝐾 = 3) classification categorizes images into
were fed into the same network architecture, and their corresponding
three groups: unaltered images, common contrast-enhanced images, and
CE forensic models were trained by means of transfer learning, which
those attacked by counter-forensic techniques.
performs fine-tuning using a pre-trained model. We utilized Inception-
Adam optimizer is utilized as the optimization solver with the
hyper-parameters recommended in [24]. The objective loss function is ResNet-v2 (IRN-v2) [26], one of the most popular base networks in the
formulated by using a cross-entropy loss of the softmax class probability research field of object classification and detection, as the pre-trained
defined as: model, and the performance evaluation was conducted under the first
and second experimental cases, as shown in Table 2. In Cases 1 and 2,

𝐾
( )
𝐿=− 𝑦𝑗 log 𝜎 𝑧𝑗 , (3) we measured the performance of the binary classification between the
𝑗=1 unaltered and manipulated images. However, the forgery of each case is
where 𝑦𝑗 ∈ {0, 1} is the ground truth label of the 𝑗th class. of a different type; the CNN models tried to detect the contrast-enhanced
Fig. 5 shows the visualization of 50 convolution kernels in the and counter-forensic-attack images in the first and second experimental
first convolutional layer of the aforementioned CNN trained using the cases, respectively. In Case 1, four types of GLCM that had been obtained
GLCM as the input. These filters look more structured rather than being by the unaltered images and the images that were processed by the HS,
random; some filters have strong horizontal/vertical edges and cross- GC, and SM were used. In Table 2, the GLCMs of the unaltered images
like patterns, while others have relatively flat patterns. Note that the were assigned with ‘‘U’’, and those of the contrast-enhanced images was
GLCMs of contrast-enhanced images have some grid-like patterns, while labeled as ‘‘E’’. Similarly, Case 2 was also performed using four types of
that of the unaltered image does not have any, as shown in Fig. 3. data, but in this case, the GLCMs of the unaltered images and those of the
Thus, the GLCMs of unaltered images show high and continuous filter images attacked by three anti-forensic techniques, which were assigned
responses after the convolutions with relatively flat filters, while the with the label ‘‘A’’, were used to evaluate the detection performance.
convolution kernels with strong edges generate high responses when In the next experiment, the forgery detection performance of the
they are convolved with the grid-like regions in the GLCMs of contrast- proposed method was compared with that of the three state-of-the-art
enhanced images. Examples and analysis of the convolutional kernels CE forensic methods. In contrast to the first experiment, two additional
and their corresponding feature maps will be discussed in Section 5.1. experimental cases, which are indicated by Cases 3 and 4 in Table 2,

153
J.-Y. Sun et al. Signal Processing: Image Communication 63 (2018) 149–160

Fig. 4. CNN architecture of the proposed method.

Fig. 5. Visualization of the convolution kernels in the first convolutional layer.

Table 2
Experimental cases with different data distributions.
Type Unaltered HSa GCa SMa GDb IBDb TVOb
# of Train+Val 4000 4000 4000 4000 4000 4000 4000
data Test 1000 1000 1000 1000 1000 1000 1000
Case 1 Uc Ec E E – – –
Case 2 2-class U – – – Ac A A
Case 3 U Fc F F F F F
Case 4 3-class U E E E A A A
a
Common CE techniques; HS=histogram stretching, GC=gamma correction, and SM=s-curve mapping.
b
State-of-the-art counter-forensic methods; GD=Gaussian dithering [11], IBD=internal bit depth increasing
[12], and TVO=total variation optimization [13].
c
Types of image used for each case; U=unaltered images, E=contrast-enhanced images,
A=anti-forensic-attack images, and F=forgery images (E ∩ A).

were added. In the third case, all seven types of GLCM were divided there are one or more gap bins. Since the threshold value for F has not
into the following two classes: unaltered images and images that are been specified in [5], it was explored heuristically among 20 different
assigned with ‘‘F’’. In this case, the GLCMs of the common contrast- values from 0 to 2 with equal intervals. After measuring the accuracy
enhanced images and those of the images that had passed through of classifying the unaltered and contrast-enhanced images varying the
the counter-forensic techniques were considered together as forgeries. threshold, the best performing threshold of 1.4 was finally selected
While the two-class classification was performed for the three previously for further experiments. The images processed by the counter-forensic
mentioned cases, Case 4 evaluated the performance of the three-class attacks were not considered in the determination of threshold, because
classification. In this case, all of the GLCM data were classified into the the conventional method [5] is introduced to discriminate unaltered
following three classes: unaltered images, common contrast-enhanced images from common contrast-enhanced images only. The CE forensic
images, and those generated by anti-forensic attacks. methods in [5] and [6] always classify the images into two categories,
The comparison targets for the second experiments [5–7] are im- the unaltered image versus the forgery. Thus, these two methods were
plemented in MATLAB. The two handcrafted features, high frequency evaluated for the first three experimental cases only. On the other
energy F and the number of gap bin N, are calculated according hand, the SVM training-based CE forensic method [7] can perform a
to the equations provided in [5] and [6], respectively. Then, each three-class classification as well as a binary classification, and thus
method identifies the authenticity of input images by using a specific the performance of this method was assessed in all four experimental
threshold. In [6], the threshold value for N is predetermined as 0, cases. The handcrafted feature proposed in [7] was extracted by using
which means the image is verified as a contrast-enhanced one when the source code which is downloadable on the authors’ website [27].

154
J.-Y. Sun et al. Signal Processing: Image Communication 63 (2018) 149–160

After the feature extraction, SVM classifier was implemented by using Table 3
MATLAB built-in functions. The SVM classifier was trained and tested Comparison of detection performance (Case 1: Unaltered images vs. Contrast-
using the same dataset used for the proposed method. enhanced images).
Method TNR TPR ACC
4.3. Implementation details Unaltered HS GC SM
Stamm and Liu [5] 0.896 0.969 1 1 0.966
This section discusses the implementation details including hyper- Cao et al. [6] 0.558 1 0.977 0.899 0.859
parameter settings and training strategies. For each type of input in the De Rosa et al. [7] 0.955 0.878 0.974 0.966 0.943
Cropped GLCM 0.998 0.986 1 1 0.996
first experiment, two models that are trained using different ways of
Proposed 0.996 0.988 0.998 0.999 0.995
transfer learning, were utilized. One model (IRN-v2-LL) was obtained
by replacing the last layer of the IRN-v2 into a new fully connected
layer that is connected to two output nodes for the binary classification,
the true negative rate (TNR), and the total accuracy (ACC) for each CE
and the parameters in the last layer were trained only; the other pre-
forensic method in the second experiment using each method’s decision
trained parameters of IRN-v2-LL were not changed. Fig. 6 illustrates
criterion. The TPR, TNR, and ACC are defined as follows:
the network architecture of the IRN-v2-LL. The colored blocks in Fig. 6
are the pre-defined modules of the INR-v2, and a detailed description 𝑇𝑃
TPR = , (4)
of these modules is explained in [26]. The network architecture of the 𝑇𝑃 + 𝐹𝑁
𝑇𝑁
other model (IRN-v2-FT) is the same as shown in Fig. 6, but all of the TNR = , (5)
𝑇𝑁 + 𝐹𝑃
parameters in this model were fine-tuned from the IRN-v2. We used the 𝑇𝑃 + 𝑇𝑁
default hyper-parameter setting of the IRN-v2 in the transfer learning, ACC = , (6)
𝑇𝑃 + 𝐹𝑁 + 𝑇𝑁 + 𝐹𝑃
except for the preprocessing function that performs random cropping.
where TP, TN, FP, and FN represent the numbers of the true positive,
For both models, the training session was ceased after 5000 iteration
true negative, false positive, and false negative samples, respectively. A
steps.
proficient forgery detector should exhibit a high ACC, and this is possible
Regarding training of our proposed CNN structure which is shown
when both the TPR and the TNR are high. To calculate the previously
in Fig. 4, the weights in the network were initialized with Xavier
mentioned metrics, the finally predicted class of the proposed CNN-
initialization [28], and there was no pre-training stage. We trained
based CE forensic method is determined as the one with the maximum-
the model using Adam optimizer [24] with 𝛽1 = 0.9, 𝛽2 = 0.999, value softmax probability. The resultant TPR, TNR, and ACC values are
and the initial learning rate of 0.001, which are the default settings provided in Tables 3–6.
recommended in [24]. Also, the mini-batch training was carried out with
different batch sizes depending on the experimental cases. In Cases 1 and 5.1. Performance analysis depending on the CNN inputs
2 that use four types of GLCM as the dataset, we trained the model with
the batch size of 40 containing 10 GLCM inputs for each type. When Prior to the performance comparison, we provide a discussion on the
training the CNN in the remaining two experimental cases, 70 GLCM convolutional kernels that are trained using three different types of input
inputs were grouped into one batch since all seven types of GLCM were and their corresponding feature maps. Figs. 7–9 illustrate the filters
used in these cases. To determine the learning rate schedule, we trained of the first convolutional layer and their corresponding feature maps
the model and monitored the convergence of the classification loss on obtained after the CNN models were trained with resized input images,
the validation data, by using the k-fold cross validation with 𝑘 = 10. cropped input images, and the GLCM of input images, respectively.
As mentioned above, there were seven types of GLCM input, each of To compare the manner in which the appearance of the feature maps
which contained 5000 GLCMs. For each type, 4000 GLCM inputs that change according to the image content and the manipulation, the filter
were used for the cross validation were equally divided into 10 folds, responses of four different images, two unaltered images and their
and subsequent training and validation of the model were performed contrast-enhanced versions, are provided. The intra-class similarity and
10 times. At each iteration, a single fold was held-out for validation, the inter-class similarity of the feature maps should be high and low,
while the remaining 9 folds were used for training. We trained the respectively, to achieve an enhanced forgery detection performance.
model with an initial learning rate of 0.001 for the first 5 epochs and That is, in Figs. 7–9, the feature maps for the unaltered images in the
decreased the learning rate by a factor of 10 for the next 10 epochs. After columns (b) and (d) should be similar to each other, and the appearances
setting the learning rate schedule, the data used for cross validation was of those for the forgeries in the columns (c) and (e) should also be
randomly split into a ratio of 3:1 for training and validation. By training similar.
the proposed network with the learning rate schedule mentioned above, The feature maps shown in Figs. 7 and 8 are image-related; they
the model with the minimum classification loss on the validation set seem to be blurred, brightness-changed, or edge images. Thus, the filter
was selected as the final model. The training data was not shuffled responses of the columns (b) and (c) are similar, and the appearance of
between epochs in our work. The proposed method was implemented on those in the column (d) are similar to those in the column (e), which
Tensorflow [29], an open source deep learning library recently released implies that these feature maps depend on the image content rather
by Google, and the proposed network was trained on a single NVIDIA than the manipulation. On the other hand, the feature maps in Fig. 9
Titan X Pascal GPU. obtained from the CNN that had been trained using the GLCM as the
input show different patterns. The first filter in Fig. 9(a) is relatively
5. Experimental results flat, which exhibits high responses when it is convolved with the flat
regions in the GLCM. Thus, the filter responses from the GLCM of the
For the quantitative analysis of the proposed CNN-based CE forensic unaltered images in Fig. 9(b) and (d) are continuous, whereas those from
method, a comparison of the receiver operating characteristics (ROC) the GLCM of the contrast-enhanced images in Fig. 9(c) and (e) showed
curves and the area under the ROC curve (AUC) was performed over uneven responses. Also, the feature maps generated using the second
all of the experimental cases. The ROC curve plots the true positive and third convolutional kernels in Fig. 9(a) exhibited high responses
rate (TPR) versus the false positive rate (FPR), where the TPRs and the on the grid-like regions in the GLCM of the contrast-enhanced images.
FPRs are obtained by varying the detection threshold. In this work, the As shown in Fig. 9, the CNN that is trained using the GLCM generates
forged and unaltered image were considered as positive and negative, feature maps with higher intra-class and lower inter-class similarities
respectively, because the main purpose of a CE forensic method is the compared with the other input types. This implies that the GLCM is an
detection of the manipulated images. In addition, we compute the TPR, effective CNN input for the detection of the CE manipulation.

155
J.-Y. Sun et al. Signal Processing: Image Communication 63 (2018) 149–160

Fig. 6. Network architecture of IRN-v2-LL and IRN-v2-FT.

Fig. 7. Convolutional kernels (a) of the CNN trained using resized input images and their corresponding feature maps generated from (b) unaltered image 1, (c)
contrast-enhanced image 1, (d) unaltered image 2, and (e) contrast-enhanced image 2.

Fig. 10 shows the ROC curves and their corresponding AUC values three state-of-the-art CE forensic algorithms was performed under the
for the models learned through the transfer learning from IRN-v2. four experimental cases mentioned in Section 4.2. In this experiment,
Compared with the ROC curves from IRN-v2-LL in Fig. 10(a) and (b), the partial GLCM with a size of 64 × 256 was acquired by cropping the
those from IRN-v2-FT in Fig. 10(c) and (d) provides better performance GLCM along the diagonal direction, and it was utilized as another CNN
in terms of AUC for the CE forensic model using the GLCM input. As input, as mentioned in Section 3.1. The performance of the CE forensic
shown in Fig. 10(a) and (c), the models trained using the GLCM input model that was trained using the cropped GLCM was also compared
outperforms those with other input types in Case 1. However, as shown with the others in all of the experimental cases. The ROC curves are
in Fig. 10(b) and (d), the GLCM does not show much superiority in depicted in Fig. 11. Since the three-class classification was done in Case
Case 2 that evaluates the ability to distinguish the images that are 4, two sets of the ROC curves were provided, as shown in Fig. 11(d)
manipulated after counter-forensic attacks from the unaltered images; and (e), which were drawn in consideration of the contrast-enhanced
in fact, even the models trained using the GLCM in Case 2 yields unsat- and counter-forensic-attack images as positive samples, respectively. A
isfactory performance because the traceable features in GLCM has faded
detailed explanation on the experimental result for each case will be
out during passing through the deep layers in the IRN-v2 architecture.
discussed in the following subsections.
A more detailed analysis will be discussed in Section 5.2.2 with the
experimental result of our proposed CNN for the same experimental
5.2.1. Case 1: Unaltered images vs. Contrast-enhanced images
case. Although the advantage of using GLCM as the CNN input has
rarely shown in Case 2, the ROC curves in Fig. 10(a) and (c) imply In the first case, the test dataset consisted of four types of GLCM input
that extracting the manipulation clues from the GLCM would be more obtained from the unaltered images and the contrast-enhanced images
effective than tracing it in the resized or cropped images. that are processed by the HS, GC, and SM. The ROC curves for Case 1 are
shown in Fig. 11(a). Both the ROC curves depicting the performance of
5.2. Performance comparison with the state-of-the-art CE forensic methods the proposed CNN models trained using the original-sized and partial
GLCM show significantly high performance. Table 3 compares the
To verify the practical effectiveness of the CNN-based CE forensic detection performance in terms of TPR, TNR, and ACC for each type
approach, a comparative evaluation of the proposed method and the of image and the highest ACC value and the second-highest one are

156
J.-Y. Sun et al. Signal Processing: Image Communication 63 (2018) 149–160

Fig. 8. Convolutional kernels (a) of the CNN trained using cropped input images and their corresponding feature maps generated from (b) unaltered image 1,
(c) contrast-enhanced image 1, (d) unaltered image 2, and (e) contrast-enhanced image 2.

Fig. 9. Convolutional kernels (a) of the CNN trained using GLCM of input images and their corresponding feature maps generated from (b) unaltered image 1,
(c) contrast-enhanced image 1, (d) unaltered image 2, and (e) contrast-enhanced image 2.

written in bold and underlined, respectively. Although the proposed the conventional method [6] suffers from difficulties in detecting the
method took the first place showing the highest mean accuracy, other TVO-attacked images. Besides, in almost every type of image, the CE
methods yielded acceptable performance as well, because the commonly forensic methods in [6] and [7] have the TPR and the TNR values of
contrast-enhanced images have sufficient traces of CE manipulation. about 0.5, indicating that these algorithms randomly fulfill the classifi-
cation regardless of the input image. Meanwhile, the proposed algorithm
5.2.2. Case 2: Unaltered images vs. Anti-forensic-attack images outperforms the conventional methods in terms of the ACC having high
In contrast to the results of Case 1, Table 4 and the ROC curves TPR and TNR. Note that both the proposed method and the conventional
in Fig. 11(b) show that all conventional methods are fooled in Case 2 method introduced by De Rosa et al. [7] utilize the GLCM in the first
since the images attacked by anti-forensic techniques no longer have the stage of the CE detection. Nevertheless, the two methods exhibited
striking evidence. As shown in Table 4, the forensic method in [5] shows significantly different performance. In the conventional method [7],
very low TPR values for images attacked by GD [11] and IBD [12], which the 1D histogram of variance value calculated from every row of the
means most attacked images are classified as unaltered ones. Similarly, GLCM is considered as the only feature space for the SVM classifier. By

157
J.-Y. Sun et al. Signal Processing: Image Communication 63 (2018) 149–160

Fig. 10. ROC curves and their corresponding AUC values for the models learned by transfer learning from IRN-v2. (a) IRN-v2-LL in case 1, (b) IRN-v2-LL in case 2,
(c) IRN-v2-FT in case 1, and (d) IRN-v2-FT in case 2.

Table 4 impressive performance in the research field of object recognition and


Comparison of detection performance (Case 2: Unaltered images vs. Anti- segmentation. From the perspective of CE forensics, however, it is more
forensic-attack images). important to find the manipulation clues that are hidden in the local
Method TNR TPR ACC region of the GLCM than to observe the traces with the global contextual
Unaltered GD IBD TVO information. Also, a blurring effect arises from numerous mathematical
Stamm and Liu [5] 0.896 0.013 0.135 0.681 0.432 operations in the IRN-v2 such as convolution and pooling layers, and this
Cao et al. [6] 0.558 0.467 0.463 0.035 0.465 may negatively affect the local information, resulting in a performance
De Rosa et al. [7] 0.506 0.520 0.509 0.586 0.530 degradation in the classification of the images manipulated by using
Cropped GLCM 0.777 0.975 0.871 0.997 0.905
counter-forensic attacks from the unaltered images.
Proposed (GLCM) 0.801 0.981 0.912 0.998 0.923

5.2.3. Case 3: Unaltered images vs. Manipulated images


In the third experiment, the contrast-enhanced and anti-forensic-
contrast, in the proposed method, the effective features of the GLCM
attack images were considered together as manipulated images. The
input are extracted by the convolutional kernels, and the parameters
TPR, TNR, and ACC in classifying the unaltered images and the forgeries
for the feature extraction and classification are learned simultaneously
are given in Table 5 and the ROC curves for Case 3 are depicted in
by the iterative end-to-end training. This integrated process enables
Fig. 11(c). The threshold-based methods [5,6] provided exactly the
the proposed method to extract not only the visually distinguishing same TPR and TNR as Cases 1 and 2. The other two methods, the
features of the GLCM but also the features which are hard to detect SVM training-based [7] and the proposed CNN-based one, however, had
based on the human perception. The experimental results verified that different TPR and TNR because CE forensic models were re-trained in
the proposed CNN-based CE forensic model exhibits a high performance these methods by using the new dataset that is different from Cases
in the detection of the images processed by anti-forensic attacks, and 1 and 2. Also, all of the ROC curves in Fig. 11(c) are different from
thus the deep features trained by the proposed CNN are more powerful those in Fig. 11(a) and (b) because the positive dataset in Case 3
than the handcrafted features in [7]. contains both the enhanced and attacked images. In Fig. 11(c), the red
As shown in Fig. 11(b), the red ROC curve and the AUC value of and black ROC curves that correspond to the proposed method show
the proposed method are higher than the black curve and its AUC higher performance than the conventional methods yielding higher AUC
that correspond to the CE forensic model trained with the cropped values. In Table 5, for the conventional methods, the TPR values of
GLCM. This indicates that the features of the whole-sized GLCM are detecting the forgeries obtained by GD [11], IBD [12], and TVO [13] are
more appropriate than those of the diagonally cropped GLCM, thereby lower than those of detecting other types of images. This implies that
yielding a better performance. Although it seems that less or no in- the conventional methods are not robust against the counter-forensic
formation exists outside of the diagonal region, traceable CE forensic attacks. However, the proposed method exhibits high performance in
features do exist, and the proposed CNN using the whole-sized GLCM the detection of the attacked image as well, showing the highest ACC
can effectively extract them through the iterative training scheme. In value written in bold.
addition, the CE forensic model trained with the proposed CNN structure
provides a greatly enhanced performance than the models fine-tuned 5.2.4. Case 4: Unaltered images vs. Contrast-enhanced images vs. Anti-
from the IRN-v2 in Fig. 10(b) and (d). In general, the deeper layers of forensic attack images
CNN structure have wider receptive fields, thereby allowing the CNN The last experimental case was to classify the test images into three
to capture further global contextual information. Indeed, the contextual classes, unaltered images, contrast-enhanced images, and anti-forensic-
information that can be obtained from the DNN structures has shown an attack images. For each method, we computed the confusion matrices

158
J.-Y. Sun et al. Signal Processing: Image Communication 63 (2018) 149–160

Fig. 11. ROC curves and their corresponding AUC values for the proposed CNN-based CE forensic methods and state-of-the-art CE forensic methods. (a) Case 1:
unaltered vs. contrast-enhanced images, (b) Case 2: unaltered vs. anti-forensic-attack images, (c) Case 3: unaltered vs. manipulated images, and Case 4: unaltered
vs. (d) contrast-enhanced vs. (e) anti-forensic-attack images.

Table 5
Comparison of detection performance (Case 3: Unaltered images vs. Manipulated images).
Method TNR TPR ACC
Unaltered HS GC SM GD IBD TVO
Stamm and Liu [5] 0.896 0.969 1 1 0.013 0.135 0.681 0.671
Cao et al. [6] 0.558 1 0.977 0.899 0.467 0.463 0.035 0.628
De Rosa et al. [7] 0.672 0.888 0.939 0.963 0.339 0.350 0.423 0.653
Cropped GLCM 0.740 0.994 1 1 0.973 0.895 0.998 0.943
Proposed (GLCM) 0.810 0.989 0.999 0.999 0.984 0.905 0.998 0.955

Table 6
Comparison of detection performance (Case 4: Unaltered images vs. Contrast-enhanced images vs. Anti-forensic-
attack images).
Method Class TNR TPR (Enhanced) TPR (Attacked) ACC
Unaltered HS GC SM GD IBD TVO
Unaltered 0.478 0.075 0.033 0.031 0.459 0.447 0.410
De Rosa et al. [7] Enhanced 0.017 0.806 0.927 0.916 0.012 0.024 0.078 0.671
Attacked 0.505 0.119 0.040 0.053 0.529 0.529 0.512
Unaltered 0.753 0.008 0 0 0.024 0.126 0.006
Cropped GLCM Enhanced 0.003 0.988 0.999 0.999 0 0 0 0.940
Attacked 0.244 0.004 0.001 0.001 0.976 0.874 0.994
Unaltered 0.821 0.011 0.001 0 0.026 0.097 0.005
Proposed Enhanced 0.001 0.987 0.997 0.999 0 0 0.001 0.954
Attacked 0.178 0.002 0.002 0.001 0.974 0.903 0.994

on the test dataset, as reported in Table 6. In the confusion matrices, elements written in bold indicate the proportion of correctly classified
each column corresponds to the real label of the test images, while each images. For example, the values in ‘‘TPR (Enhanced)’’ column represent
row indicates the predicted label of the test images. Thus, the diagonal the proportions of the test data predicted as the contrast-enhanced

159
J.-Y. Sun et al. Signal Processing: Image Communication 63 (2018) 149–160

images. In this column, all of the methods have high rates in the second [3] X. Kang, M.C. Stamm, A. Peng, K.J.R. Liu, Robust median filtering forensics using
rows and very low rates in the other two rows, indicating that the an autoregressive model, IEEE Trans. Inf. Forens. Security 8 (2013) 1456–1468.
[4] H. Yuan, Blind forensics of median filtering in digital images, IEEE Trans. Inf. Forens.
entire methods detected common contrast-enhanced images fairly well.
Security 6 (2011) 1335–1345.
However, for the classification of the unaltered and attacked images, [5] M.C. Stamm, K.J.R. Liu, Forensic detection of image manipulation using statistical
the conventional and proposed method show a significant difference in intrinsic fingerprints, IEEE Trans. Inf. Forens. Security 5 (2010) 492–506.
their performance. In Table 6, the conventional method [7] has low [6] G. Cao, Y. Zhao, R. Ni, X. Li, Contrast enhancement-based forensics in digital images,
TNR and TPR values for detecting the unaltered and counter-forensic- IEEE Trans. Inf. Forens. Security 9 (2014) 515–525.
[7] A.D. Rosa, M. Fontani, M. Massai, A. Piva, M. Barni, Second-order statistics analysis
attack images, respectively. More specifically, the conventional method to cope with contrast enhancement counter-forensics, IEEE Signal Process. Lett. 22
showed the proportions of correctly classified labels that are close to (2015) 1132–1136.
0.5 for ‘‘TNR’’ and ‘‘TPR (Attacked)’’ columns; the other half are falsely [8] X. Zhao, S. Wang, S. Li, J. Li, Passive image-splicing detection by a 2D noncausal
classified as each other, implying that the conventional method had dif- Markov model, IEEE Trans. Circuits Syst. Video Technol. 25 (2015) 185–189.
[9] I. Amerini, L. Ballan, R. Caldelli, A.D. Bimbo, G. Serra, A SIFT-based forensic method
ficulties in distinguishing between unaltered images and anti-forensic-
for copy-move attack detection and transformation recover, IEEE Trans. Inf. Forens.
attack images. On the other hand, the proposed method provides high Security 6 (2011) 1099–1110.
accuracy values in both ‘‘TNR’’ and ‘‘TPR (Attacked)’’ columns. This can [10] J. Li, X. Li, B. Yang, X. Sun, Segmentation-based image copy-move forgery detection
be seen from the ROC curves in Fig. 11(d) and (e) as well. As the ROC scheme, IEEE Trans. Inf. Forens. Security 10 (2015) 507–518.
curve can be defined for binary classification, we provide two sets of [11] G. Cao, Y. Zhao, R. Ni, H. Tian, Anti-forensics of contrast enhancement in digital
images, in: Proceedings of the 12th ACM workshop Multimedia and Security, Roma,
ROC curves which are drawn by using two different positive datasets: Italy, Sep. 9–10, 2010.
the enhanced images in Fig. 11(d) and the attacked images in Fig. 11(e). [12] C.W. Kwok, O.C. Au, S.H. Chui, Alternative anti-forensics method for contrast
The performance of our proposed method is significantly high for both enhancement, in: International Workshop Digital Watermarking, Atlantic City, NY,
ROC curves in Fig. 11(d) and (e). The conventional method, however, Oct. 23–26, 2011.
[13] H. Ravi, A.V. Subramanyam, S. Emmanuel, ACE-an effective anti-forensic contrast
shows high performance only for Fig. 11(d) and yields unsatisfactory
enhancement technique, IEEE Signal Process. Lett. 23 (2016) 212–216.
performance in detection of anti-forensic attacks in Fig. 11(e). In other [14] A. Krizhavsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep convo-
words, the proposed method achieved better performance on the entire lutional neural networks, in: Proceedings of the 25th International Conference on
types of test images, especially in discrimination between the images Neural Information Processing Systems, Lake Tahoe, Nevada, Dec. 3–6, 2012.
manipulated by counter-forensic techniques and the unaltered images. [15] Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, M.S. Lew, Deep learning for visual
understanding: A review, Neurocomputing 187 (2016) 27–48.
[16] J. Chen, X. Kang, Y. Liu, Z.J. Wang, Median filtering forensics based on convolutional
6. Conclusion neural networks, IEEE Signal Process. Lett. 22 (2015) 1849–1853.
[17] B. Bayar, M.C. Stamm, A deep learning approach to universal image manipulation
In this paper, we proposed a novel CNN-based CE forensic method. detection using a new convolutional layer, in: Proceedings of the 4th ACM Workshop
on Information Hiding and Multimedia Security, Vigo, Galicia, Spain, Jun. 20–22,
In our method, the GLCM of the input image is fed into the CNN. Deep
2016.
features are then extracted from the GLCM by using convolution kernels, [18] Q. Wang, R. Zhang, Double JPEG compression forensics based on a convolutional
and the CE forensic classification is carried out based on these features. neural network, EURASIP J. Info. Security 23 (2016).
Via back-propagation algorithm, the classification results help guide [19] I. Amerini T. Uricchio, L. Ballan, R. Caldelli, Localization of JPEG double compres-
for updating the convolution kernel parameters, thereby generating a sion through multi-domain convolutional neural networks. in: IEEE Conference on
Computer Vision and Pattern Recognition Workshops, Honolulu, Hawaii, Jul. 21–26,
new feature space for the classification at the next iteration. As such,
2017.
the proposed CNN simultaneously optimizes the feature extraction and [20] M. Barni, L. Bondi, N. Bonettini, P. Bestagini, A. Costanzo, M. Maggini, B. Tondi, S.
classification in an iterative training scheme. The forgery detection Tubaro, et al., Aligned and non-aligned double JPEG detection using convolutional
performance of the proposed method was compared to that of three neural networks, J. Vis. Commun. Image Represent. 49 (2017) 153–163.
[21] L. Bondi, S. Lameri, D. Güera, P. Bestagini, E.J. Delp, S. Tubaro, et al., Tampering
state-of-the-art CE forensic methods and the test dataset consisted of
detection and localization through clustering of camera-based CNN features, in:
the unaltered, common contrast-enhanced, and counter-forensic-attack IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu,
images. The proposed method outperformed the conventional ones in Hawaii, Jul. 21–26, 2017.
terms of the forgery detection accuracy, especially in the identification [22] V.U. Sameer, R. Naskar, N. Musthyala, K. Kokkalla, Deep learning based counter-
of the state-of-the-art counter-forensic attacks that were based on the forensic image classification for camera model identification, in: International
Workshop on Digital Watermarking, Magdeburg, Germany, Aug. 23–25, 2017.
1D histogram and the 2D GLCM. Furthermore, the proposed CNN
[23] S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by re-
can be fine-tuned by using an extended dataset including the GLCM ducing internal covariate shift, in: Proceedings of the 32th International Conference
of the images processed by any other CE manipulation techniques. on Machine Learning, Lille, France, Jul. 6–11, 2015.
Therefore, the proposed network and deep-learning based CE forensics [24] D.P. Kingma, J. Ba, Adam: A method for stochastic optimization, in: International
Conference for Learning Representations, San Diego, CA, May 7–9, 2015.
framework are applicable to identify potential anti-forensic attacks
[25] T.Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C.L.
arise in the future. In conclusion, the proposed CNN-based CE forensic Zitnick, et al., Microsoft COCO: Common objects in context, in: Proceedings of the
method achieved a significant improvement in performance regarding 13th European Conference on Computer Vision, Zurich, Switzerland, Sep. 6–12,
CE forgery detection, demonstrating a robust CE forensic method against 2014.
the state-of-the-art anti-forensic techniques. [26] C. Szegedy, S. Ioffe, V. Vanhoucke, A.A. Alemi, Inception-v4, Inception-ResNet and
the impact of residual connection on learning, in: Proceedings of the 31st AAAI
Conference on Artificial Intelligence, San Francisco, CA, Feb. 4–9, 2017.
Acknowledgements [27] A.D. Rosa, M. Fontani, M. Massai, A. Piva, M. Barni, Demo second order CE detection.
[Source code]. Available: https://lesc.dinfo.unifi.it/it/node/187.
This work was supported by Institute for Information & commu- [28] X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward
nications Technology Promotion (IITP) grant funded by the Korea neural networks, in: Proceedings of the 13th International Conference on Artificial
Intelligence and Statistics, Sardinia, Italy, May 13–15, 2010.
government (MSIP) (2014-0-00077, Development of global multi-target [29] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A.
tracking and event prediction techniques based on real-time large-scale Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M.
video analysis). Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Leveberg, D. Mane, R. Monga,
S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K.
Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viegas, O. Vinyals, P. Warder,
References
M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, et al., TensorFlow: Large-scale machine
learning on heterogeneous systems. 2015, Software available from tensorflow.org.
[1] A.C. Popescu, H. Farid, Exposing digital forgeries by detecting traces of resampling, [Online]. Available: http://tensorflow.org/.
IEEE Trans. Signal Process. 53 (2005) 758–767.
[2] W. Luo, J. Huang, G. Qiu, JPEG error analysis and its applications to digital image
forensics, IEEE Trans. Inf. Forens. Security 5 (2010) 480–491.

160

You might also like