Professional Documents
Culture Documents
IET Image Processing - 2018 - Lahrache - Rules of Photography For Image Memorability Analysis
IET Image Processing - 2018 - Lahrache - Rules of Photography For Image Memorability Analysis
IET Image Processing - 2018 - Lahrache - Rules of Photography For Image Memorability Analysis
Research Article
Abstract: Photos are becoming more spread with digital age. Cameras, smart phones and Internet provide large dataset of
images available to a wide audience. Assessing memorability of these photos is becoming a challenging task. Besides, finding
the best representative model for memorable images will enable memorability prediction. The authors develop a new approach-
based rule of photography to evaluate image memorability. In fact, they use three groups of features: image basic features,
layout features and image composition features. In addition, they introduce a diversified panel of classifiers based on some data
mining techniques used for memorability analysis. They experiment their proposed approach and they compare its results to the
state-of-the-art approaches dealing with image memorability. Their approach experiment's results prove that models used in
their approach are encouraging predictors for image memorability.
3.1 Colour features • Hue is described by a number that specifies the position of the
Colour is an essential element in photography art, since it add corresponding pure colour on the colour wheel, as a fraction
interest, effects and emotion to the captured image. As well as, between 0 and 1. Value 0 refers to red, 1/6 is yellow, 1/3 is green
colour is the first element that strikes the viewer's eye and visual and so forth around the colour wheel.
attention. While, visual attention and image memorability are • Saturation (S) of a colour describes how white the colour is. A
related [4]; some colour attributes may affect image memorability. pure red is fully saturated, with a saturation of 1, tints of red
In fact, colour can be measured by the following attributes [19]: have saturations <1 and white has a saturation of 0.
4.1 Simplicity
Simplicity is one of the most important photography rules. An
image that applies the simplicity rule is an image which contains an
Fig. 3 Sample images with simple and not simple background [27]. The object with a simple background [27]. So, the image background
top row: simple images. The bottom row: not simple images simplicity lets the viewer's eye catch and focus the subject of
interest in the image. Fig. 3 shows some sample images of simple
and not simple images layouts. Actually, simplicity leads to keep
information relatively simple within an image, in order to avoid
distractions lines or objects that lead the eye away from the main
subject and to give more value to the object of interest. In our
analysis, we use two simplicity features; the subject of interest
compactness features and the background simplicity features [27].
Fig. 7 Sample images which respect/do not respect the ROTs. The top row: 6 Classification and regression
images which respect the ROTs. The bottom row: images which do not
In Sections 3, 4 and 5, we have presented the three groups of
respect the ROTs (the ROTs is applied on images from LaMem dataset [12])
features used in our analysis. In fact, we have employed basic
features that contain colour features and sharpness and blur
symmetry within an image can be a property of the scene or the
features. Also, we have used image layout features which include
objects inside the image. Actually, we use a simple and effective
simplicity, and repetitions and patterns. Then, we have utilised
method, proposed by Loy and Eklundh [30], to detect the
image composition features such as symmetry, leading lines, DoF
symmetries in an image. It allows the detection of multiple axes of
and ROTs. These groups of features are then combined and
symmetry, rotational symmetry and symmetric figures in complex
manipulated by many data mining processes in order to evaluate
backgrounds.
the image memorability.
Generally, data mining is an analytic process for data exploring
5.2 Leading lines and evaluation [37]. This process consists of analysing data, then
Leading or prominent lines [31] are all around us such as roads, looking for consistent or systematic relationships between
rivers, buildings and trees. They can be arranged within an image variables/features, and finally creating predictive perspectives [37].
to lead towards something or infinity. Actually, this type of lines Actually, data mining goals are achieved through many tasks such
gives the observer an impression that these lines go somewhere by as classification and regression.
creating a feeling of motion. In photography, these lines create a Thereafter, we present the regression and classification methods
connection between the foreground and the background within an used in our image memorability analysis.
image, in order to lead the viewer's eyes deeper and commonly to
the main subject. Fig. 5 shows some sample images with leading/ 6.1 Classification methods
prominent lines.
The classification is a learning function used to predict the class to
So, we use the progressive probabilistic Hough transform
which a new instance belongs. We use a data set for training where
method [33] to extract prominent lines features, it is a variation of
each instance belongs to a specific known class [37]. Thus, the
Hough transform [34]. This method detects the most salient lines
classification process aims to find the descriptive data model to
(longest lines), with the minimum amount of computation required
distinguish data classes by using a set of training data. We use three
to detect lines. After the detection of lines, we extract some
classification methods among those found in literature: support
features related to these lines: number of lines, length, disposition
vector machine (SVM) [38], radial basis function (RBF) network
and other features then we take the average of these features. These
classifier [39] and J48 decision tree classifier [40]. These methods
are used to evaluate the interaction between image memorability are used to evaluate the interaction between image memorability
classes and the extracted features. scores and the extracted features.
6.1.1 Support vector machine: SVM is a supervised machine 6.2.1 Support vector regression: The SVR [38] uses the same
learning algorithm used in classification and regression problems principles as the SVM for classification, with few differences only
[38]. This algorithm performs classification tasks by constructing when the output is a real number. Actually, SVM can also be used
the hyper planes that separate different class labels in a as a regression method, maintaining the main idea that
multidimensional space. SVM is designed for binary classification characterises the algorithm: minimise error, detect the hyperplane
problems. In the case of multi-class classification, the solution is to which maximises the margin.
transform the single multiclass problem into diverse binary
classification problems [38]. 6.2.2 RBF network regressor: RBF network regressor [39] is
Actually, SVM uses a set of mathematical functions that are similar to the RBF classifier concept used in Section 6.1.2, with a
defined as the kernel. The function of kernel is to take data as input slight difference; in fact, the RBF regressor is used in regression
and transform it into the required form. Different SVM algorithms cases. So, it deals with targets/scores (continuous values) instead of
use different types of kernel functions. These functions can be of classes (discrete values).
different types, such as linear, non-linear, polynomial, RBF and
sigmoid. The kernel functions return the inner product between two 6.2.3 M5P tree: The M5P tree [41] is a decision tree used for
points in a suitable feature space. Thus, by defining a notion of regression. It is a reconstruction of Quinlan's M5 algorithm [42],
similarity, we get little computational cost even in very high- employed to include the utilisation of trees in regression models.
dimensional spaces. M5P combines a conventional decision tree with the possibility of
linear regression functions at the nodes. This algorithm is a
6.1.2 RBF network classifier: RBF network classifier is a type of multivariate tree algorithm which is utilised for noise removal and
neural network classifiers [39]. The RBF classifier computes the error reduction.
input's similarity to the examples from the training set. Each RBF
neuron stores a ‘prototype,’ which is an example from the training
set. Each neuron computes the Euclidean distance between the
7 Experimental set-up
input and its prototype to classify a new input. The RBF classier In this section, we present the process of our approach
uses one or multiple numeric inputs and generates one or multiple experimentations and methods used to evaluate the proposed
numeric outputs. The output values are determined by using the features. For the experimental tasks, we use MATLAB to extract
input values and a set of parameters (RBF centroids, RBF widths, and calculate the proposed features, and WEKA toolkit [43] to
RBF weights and RBF biases). evaluate image memorability by using selected features. Actually,
RBF uses RBFs as activation ones. The output of the network is WEKA toolkit is a collection of data pre-processing tools, machine
a linear combination of RBFs of the inputs and neuron parameters. learning algorithms (classification, regression, clustering etc.) and
Thus, the activation function ‘links’ the units in a layer to the visualisation tools.
values of units in the succeeding layer. For the output layer, the
activation function is the identity function and the output units are 7.1 Datasets
simply weighted sums of the hidden units.
We evaluate the performance of our approach by using two
6.1.3 J48 decision tree classifier: A decision tree is a predictive different datasets, Massachusetts Institute of Technology (MIT)
learning method. It is used to create a model that predicts a target image memorability dataset [1] and LaMem dataset [12]. The MIT
variable by learning some simple decision rules based on predictor dataset contains 2222 images randomly sampled with different
features. The J48 classifier implements the C4.5 algorithm [40], scene categories from scene understanding (SUN) dataset [37].
which is used to generate a decision tree employed in Images are cropped and resized to 256 × 256 and each image has a
classification. Actually, the J48 method builds a classification specific memorability score. Fig. 8 shows sample images from the
function represented by a tree; which starts from the root and ends MIT image memorability dataset [1].
with the leaves. This function discriminates examples according to On the other side, LaMem dataset contains 60,000 images
their classes and based on attributes considered as the best among sampled from a number of existing datasets such as SUN [44],
all the others by using a specific criterion. aesthetic visual analysis (AVA) dataset [45] and NUS eye fixation
(NUSEF) [46] etc. This dataset includes different type of images.
Fig. 9 shows sample images from LaMem dataset.
6.2 Regression methods Each image in both datasets has a specific memorability score.
The regression is a learning function used as a classification one. The memorability scores for the images are calculated using a
The two functions differ in the following processes: in regression, ‘visual memory game’ realised with Amazon Mechanical Turk
we predict a value from a continuous set, whereas the classification [47]. First, the game shows to participants a sequence of images
predicts to which class an instance belongs. We use three and asks them to press a space bar whenever they see a repeated
regression approaches in our study: support vector regression [38], image. Second, the responses of all participants are collected; then,
RBF network regressor [39] and M5P tree [41]. These approaches
Table 1 Regression results of the proposed features, the three groups of features and their combination used for image
memorability evaluation, based on MIT and LaMem datasets
Desc Colour Sharp/blu Symm Lines DOF ROT Patt/Rept Simpl Basic Compo Layout All
MIT dataset
SVR 0.5343 0.5051 0.5873 0.5852 0.5738 0.6052 0.6032 0.6167 0.5391 0.6151 0.6219 0.6362
RBF 0.5366 0.5052 0.5882 0.5872 0.5745 0.6155 0.5989 0.6021 0.5459 0.6189 0.6088 0.6375
Tree 0.5219 0.4964 0.5853 0.5845 0.5737 0.6102 0.5975 0.6028 0.5434 0.6156 0.6049 0.6261
LaMem dataset
SVR 0.5853 0.5796 0.6106 0.6001 0.5941 0.6341 0.6249 0.6352 0.6189 0.6385 0.6402 0.6502
RBF 0.5971 0.5806 0.6241 0.6028 0.5988 0.6388 0.6285 0.6406 0.6203 0.6391 0.6435 0.6535
Tree 0.6004 0.5822 0.6224 0.6085 0.6005 0.6405 0.6241 0.6436 0.6218 0.6424 0.6445 0.6545
Table 2 Classification results of the three proposed groups of features and their combination used for image memorability
evaluation, based on MIT and LaMem datasets
Desc Colour Sharp/blu Symm Lines DOF ROT Patt/Rept Simpl Compo Basic Layout All
MIT dataset
SVM, % 59.69 53.39 54.83 56.59 53.71 59.42 53.84 61.76 61.78 58.93 63.39 65.30
RBF, % 60.91 53.56 55.55 56.63 56.05 59.51 53.43 63.70 62.84 59.78 65.27 68.34
Tree, % 57.31 54.06 54.20 52.76 53.39 59.06 56.77 54.65 58.77 56.09 60.41 61.66
LaMem dataset
SVM, % 63.44 57.70 77.87 77.88 76.84 81.05 79.21 80.51 80.35 64.79 81.25 81.34
RBF, % 64.08 57.81 78.25 77.92 77.17 81.10 81.07 81.55 81.42 65.17 82.05 82.92
Tree, % 64.81 57.87 77.93 77.92 76.97 80.14 72.14 78.29 79.76 65.05 79.58 78.92
aim is to determine how close the model's Spearman rank [7] Celikkale, B., Erdem, A., Erdem, E.: ‘Visual attention-driven spatial pooling
for image memorability’. IEEE Computer Vision and Pattern Recognition
correlation to these human consistency rates. Since works found in Workshops, Portland, OR, USA, June 2013, pp. 976–983
literature compare their results to the baseline study [1], we [8] Wang, W., Sun, J., Li, J., et al.: ‘Investigation on the influence of visual
compare our results to these works in order to reach the human attention on image memorability’. Image and Graphics – 8th Int. Conf.,
consistency 0.75. The best features combination proposed by Isola Tianjin, China, August 2015, pp. 573–582
[9] Kim, J., Yoon, S., Pavlovic, V.: ‘Relative spatial features for image
et al. [1] give a result of ϱ = 0.46. Besides, global and full models memorability’. 21st ACM Int. Conf. Multimedia Proc., Barcelona, Spain,
made by Khosla et al. [3] get a prediction of ϱ = 0.50, they October 2013, pp. 761–764
employed more complex features to develop their approach such as [10] Bylinskii, Z., Isola, P., Bainbridge, C., et al.: ‘Intrinsic and extrinsic effects on
object and scene annotations describing the spatial layout, content image memorability’, Vis. Res., 2015, 116, pp. 165–178
[11] Peng, H., Li, K., Li, B., et al.: ‘Predicting image memorability by multi-view
and image aesthetic properties. While Kim et al. study [9] managed adaptive regression’. 23rd ACM Int. Conf. Multimedia Proc., Brisbane,
to achieve better performances (ϱ = 0.58) by using a large Australia, October 2015, pp. 1147–1150
descriptor dimension. Celikkale et al.’s results [7] get a rank of ϱ = [12] Khosla, A., Raju, A., Torralba, A., et al.: ‘Understanding and predicting
0.47. However, Kim et al. [9] reach a result close to the human image memorability at a large scale’. IEEE Int. Conf. Computer Vision,
Santiago, Chile, December 2015, pp. 2390–2398
consistency score with a small difference (0.17), while our best [13] Lahrache, S., El Ouazzani, R., El Qadi, A.: ‘Bag-of-features for image
result is close to the best score (0.75). Wulen et al. [8] get a result memorability evaluation’, IET Comput. Vis., 2016, 10, (6), pp. 1–9
of ϱ = 0.49 by employing a model-based attention to predict [14] Borkin, M., Azalea, A.V., Bylinskii, Z., et al.: ‘What makes a visualization
memorability. Peng et al. [11] achieve a rank of ϱ = 0.52 by using memorable?’, IEEE Trans. Vis. Comput. Graph., 2013, 19, (12), pp. 2306–
2315
an MAR model. Khosla et al. [12] reach a result of 0.63 by using [15] Han, J., Chen, C., Shao, L., et al.: ‘Learning computational models of video
fine-tuned deep features, which are expensive in terms of time of memorability from fMRI brain imaging’, IEEE Trans. Cybern., 2015, 45, (8),
execution and resources used in calculation process. Table 3 pp. 1692–1703
summarises all cited results. [16] Aydn, T., Smolic, A., Gross, M.: ‘Automated aesthetic analysis of
photographic images’, IEEE Trans. Vis. Comput. Graph., 2015, 21, (1), pp.
Also, in our models, we employ label classes instead of scores, 31–42
so we do not have a specific ranking for Spearman rank [17] Lo, K.-Y., Liu, K.-H., Chen, C.: ‘Intelligent photographing interface with on-
correlation. However, if we suppose that the top classification rate device aesthetic quality assessment’. Computer Vision – ACCV Workshops,
is 100%, which is the perfect memorability classification, so, our Daejeon, Korea, November 2012, pp. 533–544
[18] Ng, W.-S., Kao, H.-C., Yeh, C.-H., et al.: ‘Automatic photo ranking based on
experiments give results ranged between 53.39 and 82.92%. These esthetics rules of photography’. Technical report, National Chengchi
accuracies are performed rates in term of classification. Besides, University, Taipei, Taiwan, 2009
these results are more performed than a previous result (64.84%) [19] Bora, D., Gupta, A., Khan, F.: ‘Comparing the performance of L*A*B* and
obtained by Lahrache et al. [13]; they deal with image HSV color spaces with respect to color image segmentation’, CoRR, abs/
1506.01472, 2015, pp. 192–203
memorability as a classification problem. [20] Gao, X., Xin, J., Sato, T., et al.: ‘Analysis of cross-cultural color emotion’,
Color Res. Appl., 2007, 32, (3), pp. 223–229
9 Conclusion [21] Schanda, J.: ‘CIE colorimetry’, in Schanda, J. (Ed.) ‘Colorimetry:
understanding the CIE system’ (John Wiley & Sons, Hoboken, NJ, 2007), pp.
This paper investigates the relationship between memorability and 25–78
[22] Crete, F., Dolmiere, T., Ladret, P., et al.: ‘The blur effect: perception and
features inspired from photography rules. So, we have used three estimation with a new no-reference perceptual blur metric’. Conf. Human
groups of features. The first one covers image basic features which Vision and Electronic Imaging XII, San Jose, CA, USA, January–February
contains colour, sharpness and blur features. The second one 2007, p. 64920I
represents image layout features which is composed of simplicity, [23] Pertuz, S., Puig, D., Garc, M.A.: ‘Analysis of focus measure operators for
shape-from-focus’, Pattern Recognit., 2013, 46, (5), pp. 1415–1432
repetitions and patterns features. The third one depicts image [24] Yang, G., Nelson, B.: ‘Wavelet-based auto focusing and unsupervised
composition features, it consists of leading lines, symmetry, ROTs segmentation of microscopic images’. Proc. IEEE/RSJ Int. Conf. Intelligent
and DoF features. Then, we examine the impact of various Robots and Systems, Las Vegas, Nevada, USA October 2003, vol. 3, pp.
regression and classification methods on image memorability 2143–2148
[25] Thelen, A., Frey, S., Hirsch, S., et al.: ‘Improvements in shape-from-focus for
analysis. In summary, we have presented a novel approach to holographic reconstructions with regard to focus operators, neighborhood
evaluate image memorability which uses different features related size, and height value interpolation’, IEEE Trans. Image Process., 2009, 18,
to photography rules. All the employed features give performed (1), pp. 151–157
results in terms of regression and classification. Experimental [26] Minhas, R., Mohammed, A., Wu, Q.: ‘An efficient algorithm for focus
measure computation in constant time’, IEEE Trans. Circuits Syst. Video
results show the effectiveness of using features inspired from Technol., 2012, 22, (1), pp. 152–156
photography rules and the employment of many data mining [27] Mai, L., Le, H., Niu, Y., et al.: ‘Detecting rule of simplicity from photos’.
approaches for memorability assessment. In future works, we will ACM Multimedia, New York, NY, USA, October–November 2012, pp. 1149–
explore other rules of photography for memorability assessment. 1152
[28] Healey, C., Enns, J.: ‘Attention and visual memory in visualization and
Besides, we will use another image datasets to evaluate our image computer graphics’, IEEE Trans. Vis. Comput. Graph., 2012, 18, (7), pp.
memorability approach based on rules of photography. 1170–1188
[29] ‘Matlab Central’. Available at http://www.mathworks.com/matlabcentral/
fileexchange/36484-local-binary-patterns/, accessed April 2016
10 References [30] Loy, G., Eklundh, O.: ‘Detecting symmetry and symmetric constellations of
[1] Isola, P., Xiao, J., Parikh, D., et al.: ‘What makes a photograph memorable?’, features’. 9th European Conf. Computer Vision, Graz, Austria, May 2006, pp.
IEEE Trans. Pattern Anal. Mach. Intell., 2014, 36, (7), pp. 1469–1482 508–521
[2] Isola, P., Parikh, D., Torralba, A., et al.: ‘Understanding the intrinsic [31] ‘Digital photography school, how to use leading lines for better composition’.
memorability of image’. Advances in Neural Information Processing Systems, Available at http://digital-photography-school.com/, accessed April 2016
Granada, Spain, December 2011, pp. 2429–2437 [32] Datta, R., Joshi, D., Li, J., et al.: ‘Studying aesthetics in photographic images
[3] Khosla, A., Xiao, J., Torralba, A., et al.: ‘Memorability of image regions’. using a computational approach’. 9th European Conf. Computer Vision, Graz,
Advances in Neural Information Processing Systems, Lake Tahoe, Nevada, Austria, May 2006, pp. 288–301
United States, December 2012, vol. 25, pp. 305–313 [33] Matas, J., Galambos, C., Kittler, J.: ‘Robust detection of lines using the
[4] Mancas, M., Le Meur, O.: ‘Memorability of natural scene: the role of progressive probabilistic Hough transform’, Comput. Vis. Image Underst.,
attention’. Proc. IEEE Int. Conf. Image Process., Melbourne, Australia, 2000, 78, (1), pp. 119–137
September 2013, pp. 196–200 [34] Ballard, D.: ‘Generalizing the Hough transform to detect arbitrary shapes’,
[5] Redies, C.: ‘A universal model of esthetic perception based on the sensory Pattern Recognit., 1981, 13, (2), pp. 111–122
coding of natural stimuli’, Spat. Vis., 2007, 21, (1–2), pp. 97–117 [35] Dhar, S., Ordonez, V., Berg, T.: ‘High level describable attributes for
[6] Liu, L., Chen, R., Wolf, L., et al.: ‘Optimizing photo composition’, Comput. predicting aesthetics and interestingness’. Computer Vision and Pattern
Graph. Forum, 2010, 29, (2), pp. 469–478 Recognition (CVPR), Colorado Springs, CO, USA, June 2011, pp. 1657–1664