IET Image Processing - 2018 - Lahrache - Rules of Photography For Image Memorability Analysis

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

IET Image Processing

Research Article

Rules of photography for image memorability ISSN 1751-9659


Received on 15th June 2017
Revised 27th January 2018
analysis Accepted on 5th March 2018
E-First on 23rd March 2018
doi: 10.1049/iet-ipr.2017.0631
www.ietdl.org

Souad Lahrache1 , Rajae El Ouazzani1, Abderrahim El Qadi2


1TIM Team, High School of Technology, Moulay Ismail University, Meknes, Morocco
2LASTIMI, High School of Technology, Mohammed V University, Rabat, Morocco
E-mail: souadlahrache@gmail.com

Abstract: Photos are becoming more spread with digital age. Cameras, smart phones and Internet provide large dataset of
images available to a wide audience. Assessing memorability of these photos is becoming a challenging task. Besides, finding
the best representative model for memorable images will enable memorability prediction. The authors develop a new approach-
based rule of photography to evaluate image memorability. In fact, they use three groups of features: image basic features,
layout features and image composition features. In addition, they introduce a diversified panel of classifiers based on some data
mining techniques used for memorability analysis. They experiment their proposed approach and they compare its results to the
state-of-the-art approaches dealing with image memorability. Their approach experiment's results prove that models used in
their approach are encouraging predictors for image memorability.

1 Introduction 2 Related works


Attention, eye movements and memory let humans depict a Isola et al. [1] prove that image memorability is an intrinsic
coherent scene in their minds. Human eye is composed of some measure. So, they used different factors such as simple object
complex optical components, which allow detecting images and statistics, scene semantics and global image features to study image
interpreting their contents by the brain. The combination of optical memorability. Thereafter, in [2], the same authors show that the
components, visual attention and human memory leads to scene utilisation of a set of spatial description, aesthetic properties and
perception and from perception to memorability, there are different content, can improve image memorability assessment results.
interfering factors. Actually, memorability constitutes an intrinsic Khosla et al. proposed, in [3], a probabilistic framework with local
measure which depends only on how information is structured image regions to get images memorability maps. Besides, Mancas
within an image [1]. Humans have an exceptional visual memory and Le Meur [4], Celikkale et al. [7] and Wang et al. [8] proposed
[2] which memorises well particular characteristics in scenes even three different methods-based attention to estimate image
after the first exposure [3]. In fact, eye-tracking experiments memorability. In [9], Kim et al. presented two spatial features for
analysis shows that human's behaviour and image memorability are image memorability prediction: the weighted object area that
related [4]. As we know, human eye tends to prefer images with jointly considers the location, the size of objects and the relative
coherent arrangement and specific composition [5]. In area rank that captures the relative unusualness of objects’ size. In
photography, it is not only the objects captured in the picture that a recent study [10], the authors investigate the interplay between
matters but also the manner of taking that picture. In fact, there are intrinsic and extrinsic factors that affect image memorability. In
some photography rules, which are considered as guidelines for [11], Peng et al. propose a novel multi-view adaptive regression
image creation or composition. (MAR) model to automatically estimate the image memorability.
An image can be represented as a combination of elements such Later, Khosla et al. [12] introduce a novel experimental procedure
as objects, regions or parts. Sometimes, these elements follow to measure the human memory objectively, in order to build the
some specific composition and arrangement rules [6]. Since image largest annotated image memorability dataset to date. Also, they
memorability is inherent to the image, memorability can be related use fine-tuned deep features for image memorability evaluation. In
to how this image is created and composed. Thus, in this paper, we a recent work [13], Lahrache et al. assess the image memorability
present an approach-based photography rules for image by using the bag-of-features which is a visual feature descriptor.
memorability assessment. We employ three features vectors: basic They deal with image memorability as a classification problem.
features vector, layout features vector and composition features Borkin et al. [14] conducted a study about visualisation
vector. Basic features contain colour features using HSV and memorability. In this study, they assigned memorability scores to
L*a*b* spaces and sharpness/blur measures. Layout features are many visualisations collected from news media sites, government
used for simplicity and for repetitions and patterns detection reports, scientific journals and infographic sources. Thereafter, they
measures. Besides, composition features group symmetry and studied relation between memorability and attributes extracted
leading lines detection, depth of field (DoF) and rule of thirds from these visualisations. While Han et al. [15] presented a new
(ROTs) measures. framework to model and to automatically predict videos’
This paper is organised as follows: In Section 2, we review memorability.
some related works; in Section 3, we present image basic features; Photography rules are used in different image processing tasks,
in Section 4, we expose image layout features details; in Section 5, such as ranking, classification, quality assessment, studying
we present image composition features; in Section 6, we present aesthetics in images. In [16], the authors use aesthetic attribute
classification and regression methods used in our analysis; in metrics-based photography rules, in order to build the concepts of
Section 7, we state our approach experiments and results; and no-reference image quality assessment. They present a perceptually
finally, we conclude the paper and give some perspectives. An calibrated system for automatic aesthetic evaluation of
overview of our approach is presented in Fig. 1. photographic images. Besides, in [17], the authors present an
efficient method for instant photo aesthetics quality assessment.

IET Image Process., 2018, Vol. 12 Iss. 7, pp. 1228-1236 1228


© The Institution of Engineering and Technology 2018
17519667, 2018, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-ipr.2017.0631 by Cochrane Romania, Wiley Online Library on [08/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Fig. 1 Overview of our approach

brightness, hue, colourfulness, lightness, chroma and saturation. In


a colour science study realised by Gao et al. [20], the authors prove
that the influence of cultural background is limited to human
emotions or feelings about some colours or some colours
combinations. Besides, the authors show that chroma and lightness
are the most important factors affecting emotions through colours.
Colours can be mathematically represented or computed according
to different colour spaces [19], where the L*a*b* and the HSV
Fig. 2 Image in RGB and its corresponding in HSV and L*a*b* colour colour spaces are the most appropriate for perception [19]. Fig. 2
spaces shows an image in RGB colour space and its corresponding in
(a) Original image, (b) Converted image in HSV, (c) Converted image in L*a*b* HSV and L*a*b* colour spaces.
The L*a*b* colour space is based on one channel luminance
They employ several efficient aesthetic features by applying (lightness) and two other colour channels (a and b) known by
photography rules, which experimentally reach great classification chromaticity layers. These channel values are formulated as [21]
performance. Ng et al. propose in [18] an automatic photo ranking
based on aesthetics rules of photography. So, they present a Y
L∗ = 116 f − 16
quantitative analysis method of photo composition based on well- Yn
known photography rules: horizon balance, intensity balance, X Y
locations of regions of interest, line patterns and merger avoidance. a∗ = 500 f −f (1)
Xn Yn
Besides, they conduct a user study, whose results confirm the
hypothesis that automatic photo ranking is effective [18]. Y Z
b∗ = 200 f −f
Our approach for modelling image memorability is structurally Yn Zn
different from existing approaches dealing with image
memorability evaluation. Specifically, we construct features in –(3) where
accordance with some popular photography rules. We use
memorability scores and extracted features to make the data that 3
t if t ≻ δ3
will be handled with some machine learning methods. So, we will f (t) = t 4
use machine learning techniques to find the best classifiers for + otherwise
image memorability evaluation. In summary, we analyse the image 3δ2 29
memorability evaluation as regression and classification problems. 6
and: δ=
29
3 Extraction of image basic features
Xn, Yn and Zn are the CIE XYZ tristimulus values of the reference
An image analysis can be performed according to different levels; a white point.
basic level or a deep one. This analysis enables the extraction of In the HSV colour space, ‘H’ stands for ‘hue’, ‘S’ stands for
diverse visual information. Actually, we can perform an image ‘saturation’ and ‘V’ stands for ‘value’. When hue is used to
basic analysis to extract some interesting visual characteristics; distinguish colours, saturation is the percentage of white light
among them colour features and sharpness or blur features. Since added to a pure colour and value refers to the perceived light
colour and sharpness constitute the first image elements which intensity. The HSV colour wheel is represented as a cone or
catch visual attention for further memory processing. cylinder [19]:

3.1 Colour features • Hue is described by a number that specifies the position of the
Colour is an essential element in photography art, since it add corresponding pure colour on the colour wheel, as a fraction
interest, effects and emotion to the captured image. As well as, between 0 and 1. Value 0 refers to red, 1/6 is yellow, 1/3 is green
colour is the first element that strikes the viewer's eye and visual and so forth around the colour wheel.
attention. While, visual attention and image memorability are • Saturation (S) of a colour describes how white the colour is. A
related [4]; some colour attributes may affect image memorability. pure red is fully saturated, with a saturation of 1, tints of red
In fact, colour can be measured by the following attributes [19]: have saturations <1 and white has a saturation of 0.

IET Image Process., 2018, Vol. 12 Iss. 7, pp. 1228-1236 1229


© The Institution of Engineering and Technology 2018
17519667, 2018, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-ipr.2017.0631 by Cochrane Romania, Wiley Online Library on [08/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 Extraction of image layout features
Layout analysis is another level of image analysis. It is used to find
out how the image elements are disposed and arranged. We use
simplicity, repetitions and patterns’ features to study image layout
features. Actually, an image with fewer details is easy to remember
in comparison with another one containing many details. As well
as, fewer details mean simple things. While, repeating a shape or a
design throughout an image can underline image texture; in order
to catch viewer's attention.

4.1 Simplicity
Simplicity is one of the most important photography rules. An
image that applies the simplicity rule is an image which contains an
Fig. 3 Sample images with simple and not simple background [27]. The object with a simple background [27]. So, the image background
top row: simple images. The bottom row: not simple images simplicity lets the viewer's eye catch and focus the subject of
interest in the image. Fig. 3 shows some sample images of simple
and not simple images layouts. Actually, simplicity leads to keep
information relatively simple within an image, in order to avoid
distractions lines or objects that lead the eye away from the main
subject and to give more value to the object of interest. In our
analysis, we use two simplicity features; the subject of interest
compactness features and the background simplicity features [27].

4.2 Repetitions and patterns


Pattern is a combination of elements or shapes repeated in a
recurring and regular arrangement, it is often symbolically used to
represent many things occurred in the nature. So, patterns and
repetitions can be found all around us: a row of trees, a field of
Fig. 4 Some sample images containing repetitions and patterns from sunflowers, among streets or buildings. Furthermore, multiple
LaMem dataset [12] kinds of repetitions and patterns can be found in captured photos.
We can find an object, a form, a shape that is repeated, or a
Value (V) of a colour, also called its lightness, describes how combination of objects/shapes repeated in a regular arrangement.
dark the colour is. A value of 0 is black, with increasing lightness Even if objects are perceived with a single texture, they are made
moving away from black. up with many small and repeated elements.
Actually, human perception plays an important role in the area
of visualisation because it improves both the quality and the
3.2 Sharpness and blur quantity of displayed information [28]. This information covers
Sharpness is a photographic factor that determines the clarity of colours, shapes, patterns etc. Therefore, a viewer pre-attentively
details within an image. In a sharp picture, small details are much identifies the presence of spatial patterns formed by different
more perceptible and the overall structure is more interesting in the shapes even with the background colour variations [28]. Also, if a
humans’ eye. So, sharpness can be part of what affect perception colour remains constant across the display, it means that these same
and memory. We use two kinds of parameters (sharpness factor and shape patterns are immediately visible. Fig. 4 shows some sample
focus measure) to study sharpness effect on image memorability. images which contain repetitions and patterns.
The sharpness factor is calculated by using a no-reference Pattern is created by the repetition of (not limited to) shape,
perceptual blur metric [22]. This metric takes a value in the range line, colour, or texture. In our paper, we refer by repetition and
[0, 1], 0 and 1 are the best and the worst measure in term of blur pattern features to features based on texture descriptor. We have
perception, respectively. Also, this metric is independent of any chosen to build our rotation invariant texture descriptor on local
edge detector and it is based on discrimination between different binary patterns (LBPs). LBP is an operator for image description
levels of perceptible blur on the same picture [22]. that is based on the signs of differences of neighbouring pixels. To
The focus measure is another measure related to blur. It is a calculate pattern and repetition features, we use an implementation
quantity which measures the degree of image blurring. This degree of LBP [29], that supports multi-colour inputs and rotation
reaches the maximum value when the image is well focused and invariance. Actually, it tests the relationship between a pixel and its
decreases when the blur increases [23]. Diverse methods and neighbours by using a binary-to-text encoding. This allows the
operators have been implemented to measure either the image detection of patterns and repetition features.
focus degree or the pixel image focus degree [23]. We use three
different operators to measure the relative degree of focus of an 5 Extraction of image features composition
image:
The composition feature is the combination of distinct elements or
• The wavelet-based operators [24], which are coefficients utilised components in order to make a final representation. Composition
to measure the focus level. They take advantage of the discrete analysis carries out information about image structure and image
wavelet transform coefficients capability to describe the elements disposition. In the following, we will explore some image
frequency and the spatial content of images. composition features such as symmetry [30], leading lines [31],
DoF [32] and ROTs [6].
• The Laplacian-based operator [25], which follows the
assumption that focused images which have more sharp edges
than blurred ones. This operator measures the amount of edges 5.1 Symmetry
in images through the utilisation of the second derivative or An object can have a vertical and/or a horizontal line of symmetry,
Laplacian. and can have a rotational symmetry. In photography rules,
• The focus measure-based steerable filters [26], which is a symmetry is employed to give a balance to the photo [30]. So,
measure based on a filtered version of the image. It uses symmetry occurs when one part of the image constitutes an
steerable filters by calculating the means of integral images. identical/similar or near-identical to its other parts. As well as,

1230 IET Image Process., 2018, Vol. 12 Iss. 7, pp. 1228-1236


© The Institution of Engineering and Technology 2018
17519667, 2018, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-ipr.2017.0631 by Cochrane Romania, Wiley Online Library on [08/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
features will be used in addition to memorability scores to predict
images memorability.

5.3 Depth of field


Within an image, when it is very clear and easy to recognise edges,
it is easier for human brain to assign ‘depth’ or ‘dimensionality’ to
the image. As humans, we live in a three-dimensional world and
when our photos contain more depth or dimensionality, those
photos will act as memory triggers and help us recall the original
photos of the event in our minds. Here, we will use DoF measure
[32] to highlight the idea of depth or dimensionality. The image is
divided into 16 equal rectangular blocks {M1, …, M16}, numbered
in row-major order. W3 = {wlh3, whl3, whh3} denote the set of
Fig. 5 Sample images with leading/prominent lines from LaMem dataset wavelet coefficients in the high frequency of the hue image IH. The
[12] low DoF measure feature f53 for hue is formulated as follows, with
f54 and f55 being computed similarly for IS and IV, respectively:

∑(x, y) ∈ M_6 ∪ M_7 ∪ M_10 ∪ M 11 w3 x, y


f 53 = 16 (4)
∑i = 1 ∑(x, y) ∈ M_i w3(x, y)

This measure evaluates how much a photo is in the focus.


Therefore, an image which displays a low DoF is one where
objects are captured within a small range of depths in sharp focus,
while objects in other depths are blurred (often used to emphasise
an object of interest) [35]. Besides, in a low DoF, only a small area
Fig. 6 Samples illustrating the DoF of the photo is in the focus. So, it concentrates the viewer's eyes on
(a) Image with a low DoF (DoF = 0.9), (b) Image with a non-low DoF (DoF = 10) that part of the photo and it isolates the subject from its
surroundings. Fig. 6 illustrates the DoF.

5.4 Rule of thirds


The ROTs is a well-known rule in the world of photography [6].
The ROTs directs that an image can be divided into three
horizontal sections and three vertical sections. The horizontal and
vertical lines central intersection constitute the best location for the
most important subjects or points of interest. Fig. 7 shows some
images which respect the ROTs and others not. We use the method
presented in [36], which combines saliency and objectiveness
approaches to compute the ROTs features. This method achieves
good results.

Fig. 7 Sample images which respect/do not respect the ROTs. The top row: 6 Classification and regression
images which respect the ROTs. The bottom row: images which do not
In Sections 3, 4 and 5, we have presented the three groups of
respect the ROTs (the ROTs is applied on images from LaMem dataset [12])
features used in our analysis. In fact, we have employed basic
features that contain colour features and sharpness and blur
symmetry within an image can be a property of the scene or the
features. Also, we have used image layout features which include
objects inside the image. Actually, we use a simple and effective
simplicity, and repetitions and patterns. Then, we have utilised
method, proposed by Loy and Eklundh [30], to detect the
image composition features such as symmetry, leading lines, DoF
symmetries in an image. It allows the detection of multiple axes of
and ROTs. These groups of features are then combined and
symmetry, rotational symmetry and symmetric figures in complex
manipulated by many data mining processes in order to evaluate
backgrounds.
the image memorability.
Generally, data mining is an analytic process for data exploring
5.2 Leading lines and evaluation [37]. This process consists of analysing data, then
Leading or prominent lines [31] are all around us such as roads, looking for consistent or systematic relationships between
rivers, buildings and trees. They can be arranged within an image variables/features, and finally creating predictive perspectives [37].
to lead towards something or infinity. Actually, this type of lines Actually, data mining goals are achieved through many tasks such
gives the observer an impression that these lines go somewhere by as classification and regression.
creating a feeling of motion. In photography, these lines create a Thereafter, we present the regression and classification methods
connection between the foreground and the background within an used in our image memorability analysis.
image, in order to lead the viewer's eyes deeper and commonly to
the main subject. Fig. 5 shows some sample images with leading/ 6.1 Classification methods
prominent lines.
The classification is a learning function used to predict the class to
So, we use the progressive probabilistic Hough transform
which a new instance belongs. We use a data set for training where
method [33] to extract prominent lines features, it is a variation of
each instance belongs to a specific known class [37]. Thus, the
Hough transform [34]. This method detects the most salient lines
classification process aims to find the descriptive data model to
(longest lines), with the minimum amount of computation required
distinguish data classes by using a set of training data. We use three
to detect lines. After the detection of lines, we extract some
classification methods among those found in literature: support
features related to these lines: number of lines, length, disposition
vector machine (SVM) [38], radial basis function (RBF) network
and other features then we take the average of these features. These
classifier [39] and J48 decision tree classifier [40]. These methods

IET Image Process., 2018, Vol. 12 Iss. 7, pp. 1228-1236 1231


© The Institution of Engineering and Technology 2018
17519667, 2018, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-ipr.2017.0631 by Cochrane Romania, Wiley Online Library on [08/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Fig. 8 Sample images from the MIT image memorability dataset [1]. The images are sorted from the most memorable (top left) to the least memorable
(bottom right)

are used to evaluate the interaction between image memorability are used to evaluate the interaction between image memorability
classes and the extracted features. scores and the extracted features.

6.1.1 Support vector machine: SVM is a supervised machine 6.2.1 Support vector regression: The SVR [38] uses the same
learning algorithm used in classification and regression problems principles as the SVM for classification, with few differences only
[38]. This algorithm performs classification tasks by constructing when the output is a real number. Actually, SVM can also be used
the hyper planes that separate different class labels in a as a regression method, maintaining the main idea that
multidimensional space. SVM is designed for binary classification characterises the algorithm: minimise error, detect the hyperplane
problems. In the case of multi-class classification, the solution is to which maximises the margin.
transform the single multiclass problem into diverse binary
classification problems [38]. 6.2.2 RBF network regressor: RBF network regressor [39] is
Actually, SVM uses a set of mathematical functions that are similar to the RBF classifier concept used in Section 6.1.2, with a
defined as the kernel. The function of kernel is to take data as input slight difference; in fact, the RBF regressor is used in regression
and transform it into the required form. Different SVM algorithms cases. So, it deals with targets/scores (continuous values) instead of
use different types of kernel functions. These functions can be of classes (discrete values).
different types, such as linear, non-linear, polynomial, RBF and
sigmoid. The kernel functions return the inner product between two 6.2.3 M5P tree: The M5P tree [41] is a decision tree used for
points in a suitable feature space. Thus, by defining a notion of regression. It is a reconstruction of Quinlan's M5 algorithm [42],
similarity, we get little computational cost even in very high- employed to include the utilisation of trees in regression models.
dimensional spaces. M5P combines a conventional decision tree with the possibility of
linear regression functions at the nodes. This algorithm is a
6.1.2 RBF network classifier: RBF network classifier is a type of multivariate tree algorithm which is utilised for noise removal and
neural network classifiers [39]. The RBF classifier computes the error reduction.
input's similarity to the examples from the training set. Each RBF
neuron stores a ‘prototype,’ which is an example from the training
set. Each neuron computes the Euclidean distance between the
7 Experimental set-up
input and its prototype to classify a new input. The RBF classier In this section, we present the process of our approach
uses one or multiple numeric inputs and generates one or multiple experimentations and methods used to evaluate the proposed
numeric outputs. The output values are determined by using the features. For the experimental tasks, we use MATLAB to extract
input values and a set of parameters (RBF centroids, RBF widths, and calculate the proposed features, and WEKA toolkit [43] to
RBF weights and RBF biases). evaluate image memorability by using selected features. Actually,
RBF uses RBFs as activation ones. The output of the network is WEKA toolkit is a collection of data pre-processing tools, machine
a linear combination of RBFs of the inputs and neuron parameters. learning algorithms (classification, regression, clustering etc.) and
Thus, the activation function ‘links’ the units in a layer to the visualisation tools.
values of units in the succeeding layer. For the output layer, the
activation function is the identity function and the output units are 7.1 Datasets
simply weighted sums of the hidden units.
We evaluate the performance of our approach by using two
6.1.3 J48 decision tree classifier: A decision tree is a predictive different datasets, Massachusetts Institute of Technology (MIT)
learning method. It is used to create a model that predicts a target image memorability dataset [1] and LaMem dataset [12]. The MIT
variable by learning some simple decision rules based on predictor dataset contains 2222 images randomly sampled with different
features. The J48 classifier implements the C4.5 algorithm [40], scene categories from scene understanding (SUN) dataset [37].
which is used to generate a decision tree employed in Images are cropped and resized to 256 × 256 and each image has a
classification. Actually, the J48 method builds a classification specific memorability score. Fig. 8 shows sample images from the
function represented by a tree; which starts from the root and ends MIT image memorability dataset [1].
with the leaves. This function discriminates examples according to On the other side, LaMem dataset contains 60,000 images
their classes and based on attributes considered as the best among sampled from a number of existing datasets such as SUN [44],
all the others by using a specific criterion. aesthetic visual analysis (AVA) dataset [45] and NUS eye fixation
(NUSEF) [46] etc. This dataset includes different type of images.
Fig. 9 shows sample images from LaMem dataset.
6.2 Regression methods Each image in both datasets has a specific memorability score.
The regression is a learning function used as a classification one. The memorability scores for the images are calculated using a
The two functions differ in the following processes: in regression, ‘visual memory game’ realised with Amazon Mechanical Turk
we predict a value from a continuous set, whereas the classification [47]. First, the game shows to participants a sequence of images
predicts to which class an instance belongs. We use three and asks them to press a space bar whenever they see a repeated
regression approaches in our study: support vector regression [38], image. Second, the responses of all participants are collected; then,
RBF network regressor [39] and M5P tree [41]. These approaches

1232 IET Image Process., 2018, Vol. 12 Iss. 7, pp. 1228-1236


© The Institution of Engineering and Technology 2018
17519667, 2018, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-ipr.2017.0631 by Cochrane Romania, Wiley Online Library on [08/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Fig. 9 Sample images from the LaMem image memorability dataset [12]

methods: regression and classification. These two methods are


employed to assess the image memorability.
The classification process aims to find the descriptive data
model to distinguish data classes by using a set of training data. We
use memorability class labels for classification. The class labels are
obtained by dividing the training dataset images into three different
memorability classes by using memorability scores: the most
memorable, the typically memorable and the least memorable.
The regression is a learning function used as a classification
one. The two functions differ in the following processes: in
regression, we predict a value from a continuous set, whereas the
classification predicts to which class an instance belongs. In our
case, the continuous set contains scores values varying between 0
and 1.
Our approach for modelling image memorability is structurally
different from existing approaches dealing with image
memorability evaluation. These approaches used MIT dataset and
regression for memorability prediction. The work [12] is the only
one which used LaMem dataset for memorability prediction. In our
case, we construct features in accordance with some popular
photography rules. We use the memorability scores and the
extracted features to make the data that will be handled with
classification and regression. Thus, we use the two datasets and we
Fig. 10 An overview of our approach algorithm use classification and regression; that is why we compare our
classification results to 100% and our regression ones to the
the memorability scores are calculated by the percentage of correct calculated human consistency.
detections. We employ cross-validation evaluation method to evaluate the
Besides, authors assess human consistency by dividing the performance of extracted features and machine-learning methods
participants’ pool into two independent halves. They quantified [48]. In this model evaluation method, the data set is divided into k
how well memorability scores measured in the first half of subsets, and models are built k times. In each time, one of the k
participants matched memorability scores measured in the second subsets is used as the test set and the other k − 1 subsets are put
half. Averaging over 25 random split half trials, they calculated a together to form a training set. Then, the average error across all k
Spearman's rank correlation ϱ by using the two sets of scores. trials is computed. The advantage of this method is that this
Thus, they get ϱ = 0.75 for the MIT dataset and ϱ = 0.68 for evaluation method matters less how the data gets divided. Every
LaMem dataset as the ranks measure. The image memorability data point gets to be in a test set exactly once, and gets to be in a
model efficiency is assessed by measuring how the model's training set k − 1 times. In our analysis, we realise a cross
Spearman rank correlation is close to these human scores validation with k = 5; in order to test our models five times; in each
consistency. time, we use 80% of the image dataset for training and the rest
20% for test. We use Spearman's rank for regression and the
7.2 Methodology percentage of cases correctly classified for classification, to
measure our models accuracies. We use the MIT memorability
7.2.1 Methodology description: In the first step, we extract dataset to compare our regression results with others found in [1–4,
basic, layout and composition features from images used for 7–9, 11, 12]. Also, we use LaMem dataset to compare our
training. So, we obtain a feature representation for each image. In regression results with [12]. We compare the percentage of cases
the second step, we use classification or regression methods to train correctly classified to 100%; in order to evaluate the performances
models which will be used to predict the memorability classes or of our classification models.
memorability scores of test images. In the third and last step, we
use these methods to predict memorability classes or scores 7.2.2 Algorithm outline: Our approach algorithm can be presented
memorability. The performance evaluation is conducted in terms of as follows (see Fig. 10):
the global average classification accuracy and Spearman rank Calculate the measures performance across cross-validation
correlation. method.
In literature, many approaches [1–4, 7–9, 11, 12] use the
memorability scores and regression algorithms to analyse image
memorability. In our approach, we use two machine-learning

IET Image Process., 2018, Vol. 12 Iss. 7, pp. 1228-1236 1233


© The Institution of Engineering and Technology 2018
17519667, 2018, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-ipr.2017.0631 by Cochrane Romania, Wiley Online Library on [08/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
8 Results and discussion 8.2 Discussion
8.1 Results The analysis of experimental results shows that the layout features
(simplicity, repetition/pattern) reach the best results of regression
We follow the methodology described in Section 6.2. Regression and classification. Therefore, an image with a single background is
results are shown in Table 1. Actually, we aim to reach the human more memorable than another one with a non-simple background.
consistency ranks: ϱ = 0.75 with the MIT dataset and ϱ = 0.68 using Also, repetitions of shapes/content within an image improve its
LaMem dataset. Thus, we present the results of each group of memorability. Moreover, ROT features gives the best results
features (basic, disposition and composition), the results of features compared to the other features in the same group (composition
composing each group and the combination results of the different group features). The basic features (colour, sharpness/blur) achieve
groups of features. The combination achieves the best results for less performed results than composition features and layout
both databases: ϱ = 0.6375 for MIT and ϱ = 0.6545 for LaMem. features. As well as, basic features improve results combination of
The obtained results (ϱ = 0.6375 and ϱ = 0.6545) represent some all these features (composition and layout features). The variation
efficient correlation ranks compared to the human correlation ranks in reached results from a group to another can be explained by the
(ϱ = 0.75, ϱ = 0.68, respectively). We note that the results of layout used vector features dimension which changes from a group of
features exceed those of composition features and basic features. In features to another. Also, it may be due to the difference between
fact, we get: ϱ = 0.6219, ϱ = 0.6189 and ϱ = 0.5459 as the best the types of encoded information in each group.
results of the second, the third and the first groups, respectively, In addition, we notice that LaMem dataset achieves results
using the MIT dataset. Similarly, with the LaMem dataset, we get surpassing MIT ones. Actually, LaMem contains nearly 60,000
ϱ = 0.6445, ϱ = 0.6424 and ϱ = 0.6218 as the best results of the images with diverse content and different sizes, while MIT dataset
second, the third and the first groups, respectively. We remark that contains only 2222 resized images. The difference in the images
simplicity features achieve the best ranks in comparison with the number and images content may explain these results differences.
other features in both datasets. Although, reached results are performed results in comparison with
In the case of classification, the objective is the detection of the the human consistency in both datasets. Actually, we achieve a
memorability class to which an image belongs with a great rank of ϱ = 0.6375 as a best rank with MIT dataset compared to the
precision, in order to reach the percentage of 100%. Actually, we human consistency ϱ = 0.75, and we realise a rank of ϱ = 0.6545
obtained memorability classes based on scores of memorability. with LaMem dataset compared to the human consistency ϱ = 0.68.
We constituted three classes: images of low memorability, images A comparison between classification and regression methods
with typical memorability and images with high memorability. The shows that RBF (classifier or regressor) is slightly higher than
classification results are presented in Table 2. We present the SVM/SVR at the level of classification or correlation rank, but is
experimental results of used features, the results of the three much slower than SVM/SVR in terms of processing time. In
features groups and the results of their combination. Thus, the best addition, decision tree classifier results exceed those of RBF and
classification rates are: 68.34% for the MIT dataset and 82.92% for SVM methods using LaMem dataset. This remark can be explained
the LaMem dataset. These two rates represent some interesting by the large number of image samples in this dataset, whereas
results compared to the best precision (100%). The analysis of decision tree results using MIT sometimes surpass the RBF and
feature groups results; we note that the layout group gives better SVM results. The same observation is underlined with the results
results (65.27% for MIT and 82.05% for LaMem) than the other of the decision tree method in regression process in comparison
two groups. The results of layout features group are followed by with other results of RBF and SVR. In general, we note that the
results of composition features group and basic features group, combination of features within groups or rather the combination of
respectively. In addition, the experimentation of each feature these groups achieves better results.
separately; we note that simplicity features achieve the best rates In literature, many studies [1, 3, 4, 8, 9, 11, 12] use regression
using the two datasets (63.70% for MIT and 81.55% for LaMem). models and image memorability scores to evaluate image
memorability. Actually, spearman's rank correlation measure ϱ is
used to compare the regression results with the human consistency
rates (0.75 with MIT dataset and 0.68 with LaMem dataset). The

Table 1 Regression results of the proposed features, the three groups of features and their combination used for image
memorability evaluation, based on MIT and LaMem datasets
Desc Colour Sharp/blu Symm Lines DOF ROT Patt/Rept Simpl Basic Compo Layout All
MIT dataset
SVR 0.5343 0.5051 0.5873 0.5852 0.5738 0.6052 0.6032 0.6167 0.5391 0.6151 0.6219 0.6362
RBF 0.5366 0.5052 0.5882 0.5872 0.5745 0.6155 0.5989 0.6021 0.5459 0.6189 0.6088 0.6375
Tree 0.5219 0.4964 0.5853 0.5845 0.5737 0.6102 0.5975 0.6028 0.5434 0.6156 0.6049 0.6261
LaMem dataset
SVR 0.5853 0.5796 0.6106 0.6001 0.5941 0.6341 0.6249 0.6352 0.6189 0.6385 0.6402 0.6502
RBF 0.5971 0.5806 0.6241 0.6028 0.5988 0.6388 0.6285 0.6406 0.6203 0.6391 0.6435 0.6535
Tree 0.6004 0.5822 0.6224 0.6085 0.6005 0.6405 0.6241 0.6436 0.6218 0.6424 0.6445 0.6545

Table 2 Classification results of the three proposed groups of features and their combination used for image memorability
evaluation, based on MIT and LaMem datasets
Desc Colour Sharp/blu Symm Lines DOF ROT Patt/Rept Simpl Compo Basic Layout All
MIT dataset
SVM, % 59.69 53.39 54.83 56.59 53.71 59.42 53.84 61.76 61.78 58.93 63.39 65.30
RBF, % 60.91 53.56 55.55 56.63 56.05 59.51 53.43 63.70 62.84 59.78 65.27 68.34
Tree, % 57.31 54.06 54.20 52.76 53.39 59.06 56.77 54.65 58.77 56.09 60.41 61.66
LaMem dataset
SVM, % 63.44 57.70 77.87 77.88 76.84 81.05 79.21 80.51 80.35 64.79 81.25 81.34
RBF, % 64.08 57.81 78.25 77.92 77.17 81.10 81.07 81.55 81.42 65.17 82.05 82.92
Tree, % 64.81 57.87 77.93 77.92 76.97 80.14 72.14 78.29 79.76 65.05 79.58 78.92

1234 IET Image Process., 2018, Vol. 12 Iss. 7, pp. 1228-1236


© The Institution of Engineering and Technology 2018
17519667, 2018, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-ipr.2017.0631 by Cochrane Romania, Wiley Online Library on [08/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Table 3 Comparative results of Spearman's rank correlation measure of different memorability models found in literature
Isola et al. Celikkale et al. Mancas et al. Khosla et al. Kim et al. Wulen et al. Houwen et al. Khosla et al. Our best
[1] [7] [4] [3] [9] [8] [11] [12] result
ϱ 0.46 0.47 0.48 0.50 0.58 0.49 0.52 0.63 0.6375

aim is to determine how close the model's Spearman rank [7] Celikkale, B., Erdem, A., Erdem, E.: ‘Visual attention-driven spatial pooling
for image memorability’. IEEE Computer Vision and Pattern Recognition
correlation to these human consistency rates. Since works found in Workshops, Portland, OR, USA, June 2013, pp. 976–983
literature compare their results to the baseline study [1], we [8] Wang, W., Sun, J., Li, J., et al.: ‘Investigation on the influence of visual
compare our results to these works in order to reach the human attention on image memorability’. Image and Graphics – 8th Int. Conf.,
consistency 0.75. The best features combination proposed by Isola Tianjin, China, August 2015, pp. 573–582
[9] Kim, J., Yoon, S., Pavlovic, V.: ‘Relative spatial features for image
et al. [1] give a result of ϱ = 0.46. Besides, global and full models memorability’. 21st ACM Int. Conf. Multimedia Proc., Barcelona, Spain,
made by Khosla et al. [3] get a prediction of ϱ = 0.50, they October 2013, pp. 761–764
employed more complex features to develop their approach such as [10] Bylinskii, Z., Isola, P., Bainbridge, C., et al.: ‘Intrinsic and extrinsic effects on
object and scene annotations describing the spatial layout, content image memorability’, Vis. Res., 2015, 116, pp. 165–178
[11] Peng, H., Li, K., Li, B., et al.: ‘Predicting image memorability by multi-view
and image aesthetic properties. While Kim et al. study [9] managed adaptive regression’. 23rd ACM Int. Conf. Multimedia Proc., Brisbane,
to achieve better performances (ϱ = 0.58) by using a large Australia, October 2015, pp. 1147–1150
descriptor dimension. Celikkale et al.’s results [7] get a rank of ϱ = [12] Khosla, A., Raju, A., Torralba, A., et al.: ‘Understanding and predicting
0.47. However, Kim et al. [9] reach a result close to the human image memorability at a large scale’. IEEE Int. Conf. Computer Vision,
Santiago, Chile, December 2015, pp. 2390–2398
consistency score with a small difference (0.17), while our best [13] Lahrache, S., El Ouazzani, R., El Qadi, A.: ‘Bag-of-features for image
result is close to the best score (0.75). Wulen et al. [8] get a result memorability evaluation’, IET Comput. Vis., 2016, 10, (6), pp. 1–9
of ϱ = 0.49 by employing a model-based attention to predict [14] Borkin, M., Azalea, A.V., Bylinskii, Z., et al.: ‘What makes a visualization
memorability. Peng et al. [11] achieve a rank of ϱ = 0.52 by using memorable?’, IEEE Trans. Vis. Comput. Graph., 2013, 19, (12), pp. 2306–
2315
an MAR model. Khosla et al. [12] reach a result of 0.63 by using [15] Han, J., Chen, C., Shao, L., et al.: ‘Learning computational models of video
fine-tuned deep features, which are expensive in terms of time of memorability from fMRI brain imaging’, IEEE Trans. Cybern., 2015, 45, (8),
execution and resources used in calculation process. Table 3 pp. 1692–1703
summarises all cited results. [16] Aydn, T., Smolic, A., Gross, M.: ‘Automated aesthetic analysis of
photographic images’, IEEE Trans. Vis. Comput. Graph., 2015, 21, (1), pp.
Also, in our models, we employ label classes instead of scores, 31–42
so we do not have a specific ranking for Spearman rank [17] Lo, K.-Y., Liu, K.-H., Chen, C.: ‘Intelligent photographing interface with on-
correlation. However, if we suppose that the top classification rate device aesthetic quality assessment’. Computer Vision – ACCV Workshops,
is 100%, which is the perfect memorability classification, so, our Daejeon, Korea, November 2012, pp. 533–544
[18] Ng, W.-S., Kao, H.-C., Yeh, C.-H., et al.: ‘Automatic photo ranking based on
experiments give results ranged between 53.39 and 82.92%. These esthetics rules of photography’. Technical report, National Chengchi
accuracies are performed rates in term of classification. Besides, University, Taipei, Taiwan, 2009
these results are more performed than a previous result (64.84%) [19] Bora, D., Gupta, A., Khan, F.: ‘Comparing the performance of L*A*B* and
obtained by Lahrache et al. [13]; they deal with image HSV color spaces with respect to color image segmentation’, CoRR, abs/
1506.01472, 2015, pp. 192–203
memorability as a classification problem. [20] Gao, X., Xin, J., Sato, T., et al.: ‘Analysis of cross-cultural color emotion’,
Color Res. Appl., 2007, 32, (3), pp. 223–229
9 Conclusion [21] Schanda, J.: ‘CIE colorimetry’, in Schanda, J. (Ed.) ‘Colorimetry:
understanding the CIE system’ (John Wiley & Sons, Hoboken, NJ, 2007), pp.
This paper investigates the relationship between memorability and 25–78
[22] Crete, F., Dolmiere, T., Ladret, P., et al.: ‘The blur effect: perception and
features inspired from photography rules. So, we have used three estimation with a new no-reference perceptual blur metric’. Conf. Human
groups of features. The first one covers image basic features which Vision and Electronic Imaging XII, San Jose, CA, USA, January–February
contains colour, sharpness and blur features. The second one 2007, p. 64920I
represents image layout features which is composed of simplicity, [23] Pertuz, S., Puig, D., Garc, M.A.: ‘Analysis of focus measure operators for
shape-from-focus’, Pattern Recognit., 2013, 46, (5), pp. 1415–1432
repetitions and patterns features. The third one depicts image [24] Yang, G., Nelson, B.: ‘Wavelet-based auto focusing and unsupervised
composition features, it consists of leading lines, symmetry, ROTs segmentation of microscopic images’. Proc. IEEE/RSJ Int. Conf. Intelligent
and DoF features. Then, we examine the impact of various Robots and Systems, Las Vegas, Nevada, USA October 2003, vol. 3, pp.
regression and classification methods on image memorability 2143–2148
[25] Thelen, A., Frey, S., Hirsch, S., et al.: ‘Improvements in shape-from-focus for
analysis. In summary, we have presented a novel approach to holographic reconstructions with regard to focus operators, neighborhood
evaluate image memorability which uses different features related size, and height value interpolation’, IEEE Trans. Image Process., 2009, 18,
to photography rules. All the employed features give performed (1), pp. 151–157
results in terms of regression and classification. Experimental [26] Minhas, R., Mohammed, A., Wu, Q.: ‘An efficient algorithm for focus
measure computation in constant time’, IEEE Trans. Circuits Syst. Video
results show the effectiveness of using features inspired from Technol., 2012, 22, (1), pp. 152–156
photography rules and the employment of many data mining [27] Mai, L., Le, H., Niu, Y., et al.: ‘Detecting rule of simplicity from photos’.
approaches for memorability assessment. In future works, we will ACM Multimedia, New York, NY, USA, October–November 2012, pp. 1149–
explore other rules of photography for memorability assessment. 1152
[28] Healey, C., Enns, J.: ‘Attention and visual memory in visualization and
Besides, we will use another image datasets to evaluate our image computer graphics’, IEEE Trans. Vis. Comput. Graph., 2012, 18, (7), pp.
memorability approach based on rules of photography. 1170–1188
[29] ‘Matlab Central’. Available at http://www.mathworks.com/matlabcentral/
fileexchange/36484-local-binary-patterns/, accessed April 2016
10 References [30] Loy, G., Eklundh, O.: ‘Detecting symmetry and symmetric constellations of
[1] Isola, P., Xiao, J., Parikh, D., et al.: ‘What makes a photograph memorable?’, features’. 9th European Conf. Computer Vision, Graz, Austria, May 2006, pp.
IEEE Trans. Pattern Anal. Mach. Intell., 2014, 36, (7), pp. 1469–1482 508–521
[2] Isola, P., Parikh, D., Torralba, A., et al.: ‘Understanding the intrinsic [31] ‘Digital photography school, how to use leading lines for better composition’.
memorability of image’. Advances in Neural Information Processing Systems, Available at http://digital-photography-school.com/, accessed April 2016
Granada, Spain, December 2011, pp. 2429–2437 [32] Datta, R., Joshi, D., Li, J., et al.: ‘Studying aesthetics in photographic images
[3] Khosla, A., Xiao, J., Torralba, A., et al.: ‘Memorability of image regions’. using a computational approach’. 9th European Conf. Computer Vision, Graz,
Advances in Neural Information Processing Systems, Lake Tahoe, Nevada, Austria, May 2006, pp. 288–301
United States, December 2012, vol. 25, pp. 305–313 [33] Matas, J., Galambos, C., Kittler, J.: ‘Robust detection of lines using the
[4] Mancas, M., Le Meur, O.: ‘Memorability of natural scene: the role of progressive probabilistic Hough transform’, Comput. Vis. Image Underst.,
attention’. Proc. IEEE Int. Conf. Image Process., Melbourne, Australia, 2000, 78, (1), pp. 119–137
September 2013, pp. 196–200 [34] Ballard, D.: ‘Generalizing the Hough transform to detect arbitrary shapes’,
[5] Redies, C.: ‘A universal model of esthetic perception based on the sensory Pattern Recognit., 1981, 13, (2), pp. 111–122
coding of natural stimuli’, Spat. Vis., 2007, 21, (1–2), pp. 97–117 [35] Dhar, S., Ordonez, V., Berg, T.: ‘High level describable attributes for
[6] Liu, L., Chen, R., Wolf, L., et al.: ‘Optimizing photo composition’, Comput. predicting aesthetics and interestingness’. Computer Vision and Pattern
Graph. Forum, 2010, 29, (2), pp. 469–478 Recognition (CVPR), Colorado Springs, CO, USA, June 2011, pp. 1657–1664

IET Image Process., 2018, Vol. 12 Iss. 7, pp. 1228-1236 1235


© The Institution of Engineering and Technology 2018
17519667, 2018, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-ipr.2017.0631 by Cochrane Romania, Wiley Online Library on [08/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
[36] Mai, L., Le, H., Niu, Y., et al.: ‘Rule of thirds detection from photograph’. [43] Bouckaert, R., Frank, E., Hall, M., et al.: ‘Weka manual for version 3-7-13’.
IEEE Int. Symp. Multimedia, CA, USA, December 2011, pp. 91–96 Technical report, The University of Waikato, 2015
[37] Fayyad, U., Shapiro, G.P., Smyth, P.: ‘From data mining to knowledge [44] Xiao, J., Hayes, J., Ehinger, K., et al.: ‘Sun database: large-scale scene
discovery: an overview’, Adv. Knowl. Discov. Data Min., 1996, pp. 1–34 recognition from abbey to zoo’. Computer Vision and Pattern Recognition
[38] Cortes, C., Vapnik, V.: ‘Support-vector networks’, Mach. Learn., 1995, 20, (CVPR), San Francisco, CA, USA, June 2010, pp. 3485–3492
(3), pp. 273–297 [45] Murray, N., Marchesotti, L., Perronnin, F.: ‘AVA: a large scale database for
[39] Orr, M.: ‘Introduction to radial basis function networks’. Technical report, aesthetic visual analysis’. Computer Vision and Pattern Recognition (CVPR),
Technical Report 4/96, Center for Cognitive Science, University of Providence, RI, USA, June 2012, pp. 2408–2415
Edinburgh, 1996 [46] Ramanathan, S., Katti, H., Sebe, N., et al.: ‘An eye fixation database for
[40] Salzberg, S.: ‘C4.5: programs for machine learning by J. Ross Quinlan. saliency detection in images’. 11th European Conf. Computer Vision,
Morgan Kaufmann Publishers, 1993’, Mach. Learn., 1994, 16, (3), pp. 235– Heraklion, Crete, Greece, September 2010, pp. 30–43
240 [47] ‘Amazon Mechanical Turk’. Available at https://www.mturk.com/mturk/
[41] Wang, Y., Witten, I.H.: ‘Induction of model trees for predicting continuous welcome/, accessed June 2017
classes’. Proc. Poster Papers of the European Conf. Machine Learning, [48] Kohavi, R.: ‘A study of cross-validation and bootstrap for accuracy estimation
University of Economics, Faculty of Informatics and Statistics, Prague, 1997 and model selection’. Proc. Fourteenth Int. Joint Conf. Artificial Intelligence
[42] Quinlan, R.: ‘Learning with continuous classes’. Proc. 5th Australian Joint (IJCAI), Montréal, Québec, Canada, August 1995, pp. 1137–1145
Conf. Artificial Intelligence, Hobart, Tasmania, November 1992, pp. 343–348

1236 IET Image Process., 2018, Vol. 12 Iss. 7, pp. 1228-1236


© The Institution of Engineering and Technology 2018

You might also like