Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

SUBMITTED TO IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. XX, NO.

XX, XX 1

An Unsupervised Game-Theoretic Approach to


Saliency Detection
Yu Zeng, Mengyang Feng, Huchuan Lu, and Ali Borji

Abstract—We propose a novel unsupervised game-theoretic


salient object detection algorithm that does not require labeled
training data. First, saliency detection problem is formulated
as a non-cooperative game, hereinafter referred to as Saliency
arXiv:1708.02476v1 [cs.CV] 8 Aug 2017

Game, in which image regions are players who choose to


be ”background” or ”foreground” as their pure strategies. A
payoff function is constructed by exploiting multiple cues and
combining complementary features. Saliency maps are generated
according to each region’s strategy in the Nash equilibrium of the
proposed Saliency Game. Second, we explore the complementary
relationship between color and deep features and propose an
Iterative Random Walk algorithm to combine saliency maps (a) (a) (b)
(b) (c)
(c) (d)
(d) (e)(e)
produced by the Saliency Game using different features. Iterative Fig. 1. Saliency detection results by different methods. (a) Input
random walk allows sharing information across feature spaces, images, (b) Ground truth maps, (c) DRFI method [13], a supervised
and detecting objects that are otherwise very hard to detect. Ex- method based on handcrafted features, (d) MR method [36], an
tensive experiments over 6 challenging datasets demonstrate the unsupervised method taking image boundary regions as background
superiority of our proposed unsupervised algorithm compared to seeds, (e) Our method.
several state of the art supervised algorithms.

(e.g., boundary background prior), and then, labels are propa-


I. I NTRODUCTION gated from seeds to unlabeled regions. They work well in most
Saliency detection is a preprocessing step in computer of the cases, but their results will be inaccurate if the seeds are
vision which aims at finding salient objects in an image [2]. wrongly chosen. For instance, when image boundary regions
Saliency helps allocate computing resources to the most infor- serve as background seed, the output will be unsatisfactory if
mative striking objects in an image, rather than processing the the salient objects touch the image boundary (see the first row
background. This is very appealing for many computer vision of Figure I(d)).
tasks such as object tracking, image and video compression, On the other hand, supervised methods are generally more
video summarization, image retrieval and classification. A lot efficient. Compared with unsupervised methods based on
of previous effort has been spent on this problem and has heuristic rules, supervised methods can learn more represen-
resulted in several methods [4], [5]. Yet, saliency detection tative properties of salient objects from numerous training
in arbitrary images remains to be a very challenging task, images. The prime example is deep learning based meth-
in particular over images with several objects amidst high ods [30][37][18][17]. Owning to their hierarchical architec-
background clutter. ture, deep neural networks (e.g., CNNs [16]) can learn high-
On the one hand, unsupervised methods are usually more level semantically rich features. Consequently, these methods
economical than supervised ones because no training data is are able to detect semantically salient objects in complex
needed. But they usually require a prior hypothesis about backgrounds. However, off-line training a CNN needs a great
salient objects, and their performance heavily depend on deal of training data. As a result, using CNNs for saliency
reliability of the utilized prior. Take a recently popular label detection, although effective, is relatively less economical than
propagation approach as an example (e.g., [12][36][14][32]). unsupervised approaches.
First, seeds are selected according to some prior knowledge In this paper, we attempt to overcome the aforementioned
drawbacks. To begin with, the saliency detection problem is
Y. Zeng is with School of Information and Communication En- formulated as a Saliency Game among image regions. Our
gineering at Dalian University of Technology, Dalian, China. E-mail: main motivation in formulating saliency and attention in a
zengyu@mail.dlut.edu.cn.
M. Feng is with School of Information and Communication En- game-theoretic manner is the very essence of attention which
gineering at Dalian University of Technology, Dalian, China. E-mail: is the competition among objects to enter high level pro-
mengyangfeng@gmail.com cessing. Most previous methods formulate saliency detection
H. Lu is with School of Information and Communication Engi-
neering at Dalian University of Technology, Dalian, China. E-mail: as minimizing one single energy function that incorporates
lhchuan@dlut.edu.cn. saliency priors. Different image regions are often considered
A. Borji is with the Center for Research in Computer Vision and Com- through adding terms in the energy function (e.g., [38]) or se-
puter Science Department at the University of Central Florida. E-mail:
aliborji@gmail.com. quentially (e.g., [36]). If the priors are wrong, optimization of
Manuscript received xx 2017. their energy function might lead to wrong results. In contrast,
SUBMITTED TO IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. XX, NO. XX, XX 2

we define one specific payoff function for each superpixel background cues via graph-based manifold ranking. Saliency
which incorporates multiple cues including spatial position value of each image element was determined based on its
prior, objectness prior, and support from others. Adopting two relevance to given seeds. In [14], saliency pattern was mined
independent priors makes the proposed method more robust to find foreground seeds according to prior maps. Foreground
since when one prior is inappropriate, the other might work. labels were propagated to unlabeled regions. Tong et al. . [27]
The goal of the proposed Saliency Game is to maximize proposed a learning algorithm to bootstrap training samples
the payoff of each player given other players strategies. This generated from prior maps. These methods exploited either
can be regarded as maximizing many competing objective boundary background prior or foreground prior from a prior
functions simultaneously. The game equilibrium automatically map, while we adopt two different priors in our method for
provides a trade-off, so that when some image region can not robustness purposes. Priors only act as weak guidance with
be assigned a right saliency value by optimizing one objective very small weights in the payoff function of our proposed
function (e.g., due to misleading prior), optimization of the Saliency Game.
other objective functions might help to give them a right Some deep learning based saliency detection methods have
saliency value. This approach seems very natural for attention achieved great performance. In [30], two deep neural networks
modeling and saliency detection, as also features and objects were trained, one to extract local features and the other to
compete to capture our attention. conduct a global search. Zhao et al. . [37] proposed a multi-
In addition, it is known that one main factor for the aston- context deep neural network taking both global and local
ishing success of deep neural networks is their powerful ability context into consideration. Li et al. . [18] explored high-
to learn high-level semantically-rich features. Using features quality visual features extracted from deep neural networks
extracted from a pre-trained CNN to build an unsupervised to improve the accuracy of saliency detection. In [17], high
method seems a considerable option, as it allows utilizing level deep features and low level handcrafted features were
the aforementioned strength while avoiding time-consuming integrated in a unified deep learning framework for saliency
training. However, rich semantic information comes with the detection. All above methods needed a lot of time and many
cost of diluting image features through convolution and pool- images for training. In this work, we are not against the CNN
ing layers. Due to this, we also use traditional color features models, but we combine deep features with traditional color
as supplementary information. To make full use of these two features in an unsupervised way, which result in an efficient
complementary features for better detection results, we avoid unsupervised method complementary to CNNs that does on
taking the weighted sum of the raw results generated by the par with the above models that need labeled training data.
above Saliency Game in the two feature spaces. Instead, we Hopefully, this will encourage new models that can utilize
further propose an Iterative Random Walk algorithm across both labeled and unlabeled data.
two feature spaces, deep features and the traditional CIE-Lab Furthermore, there are many computer vision and learning
color features, to refine saliency maps. In every iteration of tasks in which game theory has been applied successfully.
the Iterative Random Walk, the propagation in the two feature A grouping game among data points was proposed in [28].
spaces are penalized by the latest output of each other. Figure 2 Albarelli et al. . [3] proposed a non-cooperative game between
shows the pipeline of our algorithm. two sets of objects to be matched. A game between a region-
In a nutshell, the main contributions of our work include: based segmentation model and a boundary-based segmentation
1) We propose a novel unsupervised Saliency Game to model was proposed in [6] to integrate two sub-modules.
detect salient objects. Adopting two independent priors Erdem et al. [9] formulated a multi-player game for trans-
improves robustness. The nature of game equilibria duction learning, whereby equilibria correspond to consistent
assures accuracy when both priors are unsatisfactory, labeling of the data. In [23], Miller et al. showed that the
2) Utilizing semantically-rich features extracted from pre- relaxation labeling problem [11] is equivalent to finding Nash
trained fully convolutional networks (FCNs) [22], in equilibria for polymatrix n-person games. However, to the best
an unsupervised manner, the proposed method is able of our knowledge, game theory has not yet been used for
to identify salient objects in complex scenes, where salient object detection.
traditional methods that use handcrafted features may
fail (see Figure I(c)), and III. D EFINITIONS AND SYMBOLS
3) An Iterative Random Walk algorithm across two feature
spaces is proposed that takes advantage of the comple- In this section, we introduce definitions and symbols that
mentary relationship between the color feature space and will be used throughout the paper.
the deep feature space to further refine the results. Superpixels: In our model, the processing units are superpix-
els segmented from an input image by the SLIC algorithm [2].
I = {1, 2, 3, ..., N } denotes the enumeration of the set of
II. R ELATED WORK superpixels. Pi (x, y), i ∈ I, denotes the mask of the i-th
Some saliency works have followed an unsupervised ap- superpixel, where Pi (x, y) = 1 indicates that the pixel located
proach. In [12], saliency of each region was defined as its at (x, y) of the input image belongs to the i-th superpixel, and
absorbed time from boundary nodes, which measures its Pi (x, y) = 0, otherwise.
global similarity with all boundary regions. Yang et al. . [36] Features: We use FCN-32s [22] features due to its great
ranked the similarity of image regions with foreground or success in semantic segmentation. We choose the output of
SUBMITTED TO IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. XX, NO. XX, XX 3

MetricFusion
Metric Fuse
(g) (h)
(j) (i)
(a)

Color Feature

(c) (e)

Penalize
(b)
Multiscale
Input Superpixels (f) Multiscale
Output
Iterate outputs
(d)
Deep Feature
Iterative
Saliency Game
Random Walk

Fig. 2. The pipeline of our algorithm. The input image is segmented into superpixels in several scales. Th following process is applied to
each scale, and the average of the results in all scales is taken as the final saliency map. First, the Saliency Game among superpixels is
formulated in two feature spaces ((a) and (b)), respectively to generate corresponding results ((c) and (d)). Second, an Iterative Random
Walk is constructed to refine the results of the Saliency Game. Then outputs (e) and (f ) are summed (with weights) as the result in one scale.
In the Iterative Random Walk, complete affinity metric ((g) in deep feature space and (h) in color space) in one feature space is fused with
neighboring affinity metric ((i) in color space and (j) in deep space) in another feature space. This is done to exploit the complementary
relationship between the two feature spaces. Finally, we average the results in all scales to form the final saliency map.

the Conv5 layer as feature maps for the input image. This is the i-th superpixel. N2 (i) and N3 (i) are defined as follows:
because features in the last layers of CNNs encode semantic N2 (i) = {j|j ∈ N1 (k), k ∈ N1 (i), j 6= i}, (3)
abstraction of objects and are robust to appearance variations. (
∅ if i ∈
/B
Since the feature maps and the image are not of the same N3 (i) = , (4)
resolution, we resize the feature maps to the input image size {j|j ∈ B, j 6= i} if i ∈ B
via linear interpolation. We denote the affinity between the where B denotes the set of superpixels in image boundary.
i-th superpixel and the j-th superpixel in deep feature space
as Ad (i, j), which is defined to be their Gaussian weighted
Euclidean distance: IV. S ALIENCY GAME
f − f d 2  Here, we formulate a non-cooperative game among super-
d
d i j
A (i, j) = exp − , (1) pixels to detect salient objects in an input image. The input
σ2
where fi is the deep feature vector of superpixel i. Each
d image is firstly segmented into N superpixels which act as
superpixel is represented by the mean deep feature vector of players in the Saliency Game. Each player chooses to be
all its contained pixels. ”background” or ”foreground” as its pure strategy and its
mixed strategy corresponds to this superpixel’s saliency value.
After showing their strategies, players obtain some payoff
The semantically-rich deep features can help accurately according to both their own and other players’ strategies.
locate the targets but fail to describe the low-level information. Payoff is determined by a payoff function which incorporates
Therefore, we also employ color features as a complement position and objectness cues as well as support from others.
to deep features. Inspired by [12], we use CIE-Lab color We use each player’s mixed strategy in the Nash equilibrium
histograms to describe superpixels’ color appearance. With of the proposed Saliency Game as the saliency value of this
CIE-Lab color space divided into 83 ranges, the color feature superpixel in the output saliency map. Such an equilibrium
vector of the i-th superpixel is denoted as fic . Affinity between corresponds to a steady state where each player plays a
superpixels i and j in the color feature space is denoted as strategy that maximize its own payoff when the remaining
Ac (i, j), which is defined to be their Gauss weighted Chi- players’ strategies are kept fixed, which provides a globally
square distance: plausible saliency detection result.
χ2 (fic , fjc ) 
Ac (i, j) = exp − . (2)
σ2 A. Game setting
Neighbor: We adopt a definition of 2-hoop neighbor which
is frequently used in superpixel based saliency detection The pure strategy set is denoted as S = {0, 1}, indicating
methods. The set of the i-th superpixel’s neighbors is denoted ”to be foreground” or ”to be background”, respectively. All
as N (i) = N1 (i) ∪ N2 (i) ∪ N3 (i), where N1 (i) indicates the superpixels’ pure strategies are collectively called a pure
set of superpixels who share at least one common edge with strategy profile, denoted as s = (s1 , ..., sN ). The strategy
SUBMITTED TO IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. XX, NO. XX, XX 4

profile set is denoted as Θ. πij (si , sj ) denotes a single payoff superpixel. Hence, here we focus on modeling payoff of every
that superpixel i obtains, when playing pure strategy si against 2-person game. We define the payoff πij (si , sj ) of superpixel
superpixel j who holds a pure strategy sj , in their 2-person i for its 2-person game with j as a weighted sum of three
game. There are four possible values for πij (si , sj ) that can terms:
be put into a 2 × 2 matrix Bij , πij (si , sj ) = λ1 · posi (si ) + λ2 · obji (si ) + sptij (si , sj ), (8)
 
πij (0, 0) πij (0, 1) where posi (si ), obji (si ), and sptij (si , sj ) indicate the i-th
Bij = . (5)
πij (1, 0) πij (1, 1) superpixel’s position prior, objectness prior and support that
Payoff of superpixel i in pure strategy profile s, where ∀j ∈ I superpixel j gives to superpixel i, respectively. λ1 and λ2 are
the j-th superpixel’s pure strategy sj is the j-th component parameters controlling the weight of the first two terms.
of vector s, is denoted as πi (s). Payoff of superpixel i Position: Position prior term in the payoff function is formu-
when it adopts a pure strategy ti (not necessarily the i- lated based on the observation that salient objects often fall at
th component of s), while all other superpixels adopt pure the image center. The position term should give a greater pay-
strategies in pure strategy profile s is denoted as πi (ti , s−i ). off when, a) Center superpixels choose to be foreground, and
We make an assumption that the total payoff of superpixel i b) Boundary superpixels choose to be background. Assuming
for playing with all others is the summation of payoffs for (x0 , y0 ) to be the image center, and (xi , yi ) to be the center
playing 2-player games with every other P single superpixel. coordinate of superpixel i, the position prior term is defined
Formally, wePassume that πi (s) = j6=i πij (si , sj ) and as follows, (
πi (ti , s−i ) = j6=i πij (ti , sj ). 2 2
1
exp{− (xi −x0 ) +(y i −y0 )
} if si = 1
A pure best reply for player i against a pure strategy profile posi (si ) = 1 N σ
(xi −x0 )2 +(yi −y0 )2
.
s is a pure strategy such that no other pure strategy gives a N (1 − exp{− σ }) if si = 0
higher payoff to i against s. The i-th player’s pure best-reply (9)
correspondence, which maps each pure strategy profile s ∈ Θ Objectness: Generally, objects attract more attention than
to a pure strategy si ∈ S, is denoted as βi : Θ → S: background clutter. Hence superpixels which are part of an
object are more likely to be salient. The objectness term
βi (s) = {si ∈ S|πi (si , s−i ) ≥ πi (ti , s−i ), ∀ti ∈ S}. (6)
should give a greater payoff when, a) Superpixels with high
The combined pure best-reply correspondence β : Θ → Θ is objectness choose to be foreground, and b) Superpixels with
defined as the cartesian product of all players’ pure best-reply low objectness choose to be background.
correspondence:
We exploit the geodestic object proposal (GOP) [15]
β(s) = ×i∈I βi (s) ⊂ Θ. (7) method to extract a set of object segmentations, and define
A pure strategy profile s is a pure Nash equilibrium if s ∈ the objectness of a superpixel according to its overlap with all
β(s). GOP proposals as follows:
A probability distribution over the pure strategy set is
 P
PNo Oj (x,y)×Pi (x,y)
 1
 N ·No j=1
x,yP
Pi (x,y)
, if si = 1
termed as a mixed strategy. Mixed strategy of the i-th su-
 x,y
obji (si ) =
perpixel is denoted as a 2-dimensional vector zi = (zi0 , zi1 )T ,  PNo P
Oj (x,y)×Pi (x,y)
 1 (1 − 1 x,yP
), otherwise

while zi0 = P (si = 0), zi1 = P (si = 1) and zi0 + zi1 = 1. N No j=1 x,y Pi (x,y)

The set of mixed strategies is denoted as ∆. A pure strategy (10)


thereby can be regarded as an extreme mixed strategy where where {Oj }No is the set of object candidate masks generated
only one component is 1 and the other one is 0, e.g., i- by the GOP method, where Oj (x, y) = 1 indicates that the
th player’s pure strategy si = 1 is equivalent to its mixed pixel located at (x, y) of the input image belongs to the j-th
strategy zi = (0, 1)T because P (si = 0) = 0 and P (si = object proposal, and Oj (x, y) = 0, otherwise. Pi is the mask
1) = 1. Correspondingly, expected payoff of superpixel i of the i-th superpixel as in Section III.
for playing mixed strategy zi against superpixel j holding Support: With a much larger weight in the payoff function (λ1
mixed strategy zj is denoted as uij (zi , zj ) P = ziT Bij zj . We and λ2 being small), support from others is the main source
also denote Z = (zP , ..., z ), u (Z) = of payoff obtained by each superpixel. When playing with an
1 N i j6=i uij (zi , zj )
and ui (wi , Z−i ) = u (w , z ) to be mixed strategy opponent, each superpixel judges if the opponent’s strategy
j6=i ij i j
version of s, πi (s) and πi (ti , s−i ). Similarly, a mixed Nash is right or wrong with its own stance, and provides a higher
equilibrium is also defined to be a mixed strategy profile which or lower even negative support to the opponent accordingly.
is a mixed best reply to itself. These symbols or definitions More precisely,
are not stated here individually due to limited space. • Each superpixel takes a neutral attitude to opponents who
From the definition of the Nash equilibrium above, it can hold different pure strategies from itself, and provides
be inferred that in a Nash equilibrium of a game, each player them zero support.
adopts a strategy that maximizes its own payoff when other • If an opponent adopts the same pure strategy as super-
players’ strategies are fixed. pixel i,
– if the opponent’s strategy is similar to it, then su-
B. Payoff function perpixel i provides the opponent a great support in
We have assumed in Section IV-A that the total payoff of recognition of its choice.
superpixel i for playing with all others is the summation of – else if the opponent is not similar with it, then
every single payoff in its 2-person games with every other superpixel i provides the opponent a small even
SUBMITTED TO IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. XX, NO. XX, XX 5

viding large or small even negative support. Usually, no matter


what strategy a superpixel adopts, there are both protesters and
supporters. Game equilibira provide a good trade off among
different influences. Thus, in the equilibrium of the proposed
Saliency Game with payoff function as defined in Eqn. 8, each
superpixel chooses a strategy that suits itself best given its
position, objectness, and support from others. Doing so has
two advantages: 1) the center position prior and the objectness
prior are almost independent, when one prior is unsatisfactory,
(a) (a) (b)(b) (c)
(c) (d)
(d) (e)
(e) the other may work.
Fig. 3. The proposed Saliency Game still works well even when position As shown in the first row of Figure 3, the little pug appears
prior and objectness prior are not very satisfying. (a) Input images. (b) Ground away from image center, but since it is the only object in the
Truth maps. (c) and (d) Illustration of the position term and objectness term image, the objectness prior identifies it correctly. 2) the two
in Eqn. 9. (e) Saliency maps by Saliency Game using the color feature.
priors only serve as weak guidance and obtain small weights
in the payoff function. Even when they are both unsatisfactory
negative support as punishment. on some image regions, pressure from peers will impel these
Formally, the support term is defined as follows, regions to get proper saliency values in the equilibrium of the
( PN game. As shown in the second row of Figure 3, although only
k=1 A(i, k), if si = sj
α
A(i, j) − N
sptij (si , sj ) = , heads of the people are high in position prior and objectness
0, if si 6= sj prior, the produced saliency map can highlight the entire
(11) object. From the third row of Figure 3, we can see that
where α is a positive constant, A(i, j) is the affinity between the proposed algorithm also suppresses background effectively
superpixels i and j, defined as Ac and Ad in Section III. when prior highlights background areas by mistake. Note that
So far, we have modeled the payoff πi (si , sj ) that superpixel in order to illustrate effectiveness of the proposed Saliency
i obtains by playing a 2-person pure strategy game with Game, only color feature is used in the three shown cases.
superpixel j. The expected payoff ui (wi , Z−i ) that superpixel
i obtains by playing mixed strategy game with all others V. I TERATIVE RANDOM WALK
adopting strategies in mixed strategy profile Z can be given Traditional color features are of high-resolution, so saliency
based on the definition stated at the beginning of this section, maps generated in color space are detailed and with clear
N
X edges. But due to lack of high-level information, sometimes
ui (wi , Z−i ) = wiT Bij zj . (12)
they fail to locate the targets accurately (see Figure 4(c)). On
j=1∧j6=i
the contrary, since deep features encode high-level concept of
C. Computing equilibrium objects well, saliency maps generated in the deep feature space
are able to find correct salient objects in an image. But due to
We use Replicator Dynamics [26] to compute the mixed several layers of convolution and pooling, these features are
strategy Nash equilibrium of the proposed Saliency Game. too coarse. Thus the generated saliency maps are indistinctive,
In Replicator Dynamics, a population of individuals play the as shown in Figure 4(d).
game, generation after generation. A selection process acts Accordingly, here, we use both complementary features for
on the population, causing the number of users holding fitter a better result. However, as shown in Figure 4(e), although
strategies to grow faster. We use discrete time Replicator the weighted sum of the two is slightly better, they are not
Dynamics to find the equilibrium of the game, iterating until satisfactory. To solve this problem, in this section, inspired by
∀i ∈ I, |zi (t) − zi (t − 1)| < , the metric fusion presented in [29], we propose an Iterative
const + ui (eh , Z(t)−i ) Random Walk method to best exploit this two complementary
zih (t + 1) = zih (t) , (13)
const + ui (Z(t)) feature spaces. In the proposed Iterative Random Walk, metrics
where zih (t) represents the h-th component of the i-th player’s in the two feature spaces are fused as stated in [29] (corss
mixed strategy at time t, eh is a vector whose h-th component fusion in Eqn. 16 is the work of Tu et al. .), in addition, we
is 1, while other components are 0. We set the initial mixed also make the two propagation penalized by the latest output
strategies of player i to zi (0) = (0.5, 0.5), ∀i ∈ I. const of each other (cross propagation in Eqn. 17 and Eqn. 18 is
is background birthrate for an individual, which is set to our work). Figure 5 shows that both cross fusion and cross
a positive number to make sure const + ui (eh , Z(t)−i ) is penalization contribute.
positive for all i in I [34]. There could be multiple equilibria With superpixels as nodes, a neighbor graph and a complete
in a game, likewise in the proposed Saliency Game. Replicator graph are constructed in both feature space (deep and color
Dynamics might reach different Nash equilibria if the initial features). The affinity between two superpixels is assigned to
state zi (0) is set to different interior points of ∆. Empirically, the edge weight. Four weight matrices are defined:
we find that zi (0) = (0.5, 0.5) is a good initialization leading • W and W : weight matrices of neighbor and complete
d d

to plausible saliency detection. graphs in the deep feature space, respectively.


In the proposed Saliency Game, each superpixel inspects • W and W : weight matrices of neighbor and complete
c c

strategies of all other superpixels and takes a stance by pro- graphs in the color space, respectively.
SUBMITTED TO IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. XX, NO. XX, XX 6

In the complete graphs, there is an edge between every pair of


nodes, while in the neighbor graphs, each node is connected
only to its neighbors. ∀i ∈ I, W d and W d are defined as
follows,
W d (i, j) = Ad (i, j), ∀j ∈ I, (14)
(
Ad (i, j) if j ∈ N (i)
W d (i, j) = , (15)
0 if j ∈
/ N (i)
(a)(a) (b)(b) (c)
(c) (d)
(d) (e)
(e) (f)(f)
W c and W c are defined similarly but using Ac . See Section III Fig. 4. Effect of the Iterative Random Walk. (a) Input images. (b) Ground
for definitions of Ad and Ac . Truth maps. (c) Saliency maps generated by our Saliency Game in the color
Firstly, let P(0) (i, j) =
PN
W (i, j)/ j=1 W (i, j), feature space. (d) Saliency maps generated by our Saliency Game in the deep
PN feature space. (e) The weighted summation of (c) and (d). (f) Saliency maps
P(0) (i, j) = W(i, j)/ j=1 W(i, j), and t to be the after refinement by the Iterative Random Walk proposed in this section.
number of iterations. Symbols with superscript c or d
correspond to variables in the color or deep feature space,
respectively. Following [29], we fuse these four affinity A. Parameter setting
matrices as follows, All parameters are set once fixed over all the datasets. We
segment an image into 100, 150, 200, and 250 superpixels
(
d
P(t+1) = P c × P(t)
d
× Pc
. (16) (i.e., 4 segmentation image), run the algorithm on each map,
P(t+1) = P × P(t) × P d
c d c
and average the four outputs to form the final saliency map. σ
Then, using the fused affinity matrices, we let the propaga-
is set to 0.1 and  is set to 10−4 . The parameters controlling
tion results in the two feature spaces penalize each other. Two
the weight of each term in the payoff function (Eqn. 8) are
random walk energy functions are defined as follows,
N
set to λ1 = 2.1 × 10−6 , λ2 = 9 × 10−7 , respectively. α in
Eqn. 11 is set to 0.007. β in Eqn. 19 and Eqn. 20 is set to 1.
d
X d
X
E(t+1) (l) = P(t) (i, j)(li − lj )2 + β c
(li − li(t) )2 , (17)
i,j i=1 In Eqn. 21, we set T = 20, ρ1 = 0.3 and ρ2 = 0.7.
c
X c
N
X The proposed method is implemented in MATLAB on a
E(t+1) (l) = P(t) (i, j)(li − lj )2 + β d
(li − li(t) )2 . (18) PC with a 3.6GHz CPU and 32GB RAM. It takes about 2.3
i,j i=1
seconds to generate a saliency map, excluding the time for
where l is the label vector, li is the i-th superpixel’s label, and deep feature extraction and superpixel segmentation.
β is a parameter.
By minimizing the two energy functions above, we have,
d d
B. Evaluation metrics
l(t+1) = arg min E(t+1) (l) = (Ld(t+1) + βI)−1 l(t)c
, (19)
l We use precision-recall curve, F-measure curve, F-measure
c c
l(t+1) = arg min E(t+1) (l) = (Lc(t+1) + βI)−1 l(t)
d
, (20) and AUC to quantitatively evaluate the experimental results.
l
The precision value is defined as the ratio of salient pixels cor-
where L is the Laplacian matrix. l(0)c
and l(0)
d
are set to the rectly assigned to all salient pixels in the map to be evaluated,
results of the Saliency Game stated in Section IV. After T while the recall value corresponds to the percentage of the
rounds, the iteration converges and the final saliency map is detected salient pixels with respect to all salient pixels in the
obtained as: ground-truth map. The F-measure is an overall performance
c d
S = ρ1 · l(T ) + ρ2 · l(T ) , (21) indicator computed by the weighted harmonic of precision and
where ρ1 and ρ2 control the weight of the two results. recall. We set β 2 = 0.3 as suggested in [1] to emphasize the
As shown in Figure 4(f), through the Iterative Random precision.
Walk, information from the color space helps cut the whole Given a saliency map with intensity values normalized to
salient object clearly. Semantic information from deep features the range of 0 and 1, a series of binary maps are produced by
helps locate the target object accurately. Also, objects that using several fixed thresholds in [0, 1]. We compute the preci-
could not be detected in one feature space can be detected sion/recall pairs of all the binary maps to plot the precision-
with the help of results from the other feature space. recall curves and the F-measure curves. As suggested in [1],
we use twice the mean value of the saliency maps as the
threshold to generate binary maps for computing F-measure.
VI. E XPERIMENTS AND R ESULTS
Notice that some works have reported slightly different F-
In this section, we evaluate the proposed method on 6 measures using different thresholds.
benchmark datasets: ECSSD [35] (1000 images), PASCAL-
S [20] (850 images), MSRA-5000 [21] (5000 images), HKU-
IS [18] (4447 images), DUT-OMRON [36] and SOD [24]. C. Algorithm validation
We compare our algorithm with 11 state-of-the-art methods To demonstrate the effectiveness of each step of our algo-
including BL [27], BSCA [25], DRFI [13], DSR [19], HS [35], rithm, we test the proposed Saliency Game and the Iterative
LEGS [30], MCDL [37], MR [36], RC [7], wCO [38], and Random Walk (with and without metric fusion) separately on
KSR [33]. Results of different methods are provided by ECSSD and PASCAL-S datasets. PR curves in Figure 5 show
authors or achieved by running available codes. that:
C(s1)
0.6 C(s2) C(s3) C(s4) C(s1) C C(s2)
D CD(AVE)
C(s3) C(s4)
CD(IRW*) C CD(IRW)
D CD(AVE)
MR MCDL
CD(IRW*) CD(IRW) MR MCDL
0.9 0.9

Preci
0.3 0.3
0.3 0.3
1 0.841 1 1
0.841 0.84
1
0.84
0.5
0.9 0.9
0.2 0.2
0.2 0.2
0.9 0.4 0 0.8 0.84
0.2 0.4 0.6
0.9
0.8
0.8 1 00
0.9
0.2
0.2 0.4
0.4 0.6
0.6 0.8
0.8 11
0.84
0
0.9
0.2 0.4 0.6 0.8 1

DSUBMITTED Recall CD(IRW) Recall


Recall Recall
4) C CD(AVE) TO
0.7
0.82 IEEE TRANSACTIONS
CD(IRW*) ON IMAGE
MR PROCESSING,
0.82MCDL VOL. XX,0.82
NO. XX, XX 0.7 0.82 7

Precision

Precision
0.3 0.81 0.6 0.6
0.8
1
10.8 0.2 0.5 10.8 0.5
0.8
0.82
0.8
0.8 1 0 0.2 0.4 0.82
0.6 0.8 1 0.8 0.8
0.9
C(s1) 0.4 0.4
C(s1) 0.8
0.9 C(s1)
0.8 Recall C(s3)
C(s2) C(s4)0.9 C D CD(AVE) CD(IRW*) CD(IRW) MR MCDL
0.7 C(s2)
1
0.9 0.3 0.7 0.3 C(s2)
1 0.7 0.84 0.7

F-measure
0.84

F-measure

F-measure
Precision

Precision

Precision

Precision
F-measure
0.8
0.7
C(s3)
0.8 0.2
0 0.2 0.4 0.6 0.8 0.8 1
0.2
0
0.7
C(s3)
0.2 0.4 0.6 0.8 1 0.78 0.8 0.78
0.6
0.9
C(s4) 0.8 Recall
0.6
0.9
C(s4) Recall0.78 0.82 0.82
0.80.78 0.6 0.6
C C

F-measure
0.7 0.7

F-measure
0.7
10.8 0.8
Precision

Precision
Precision

Precision
0.76 0.76 0.8
Precision

D C(s1) D 0.78
0.8
0.60.5 0.60.5
0.9
0.7
CD(AVE)C(s2) 0.7
CD(AVE) 0.5 0.5

F-measure

F-measure
Precision

Precision
0.6
0.8 0.6
0.70.76
C(s3) 0.78 0.6 0.76 0.78
CD(IRW*) C(s4) CD(IRW*) 0.78
0.5
0.6
0.5
0.6 0.74 0.74
0.4
0.7
CD(IRW)CD 0.4 CD(IRW) 0.4 0.76 0.4 0.76
Precision
Precision

0.5
MR
0.60.5 CD(AVE)
0.6 MR
0.5
0.76 RFCN RFCN
0.4 CD(IRW*)
0.74 0.4 0.74 0.74
0.5 MCDL 0.76 MCDL
0.3 0.4 CD(IRW)
0.3 0.4
0.3
0.72
0.3
0.72 Ours RFCN
Ours
0.5
DRFI
0.4 0.4
MR 0.5
DRFI 0.74 0.74
0.3 MCDL 0.3 0.72 Ours
0.3 0.5
DRFI 0.3

0.20.2
0.3

0.3 0.72 0.20.2 0.72


0.2 0.7 0.72 0.2 0.7 0.7
RFCN
1 8000 0.408000 2000 4000
0.2 0.20.2 0.60.74 0.2 000.4 0 2000 4000 6000 10000 60006000 8000 8000 10000
0.62000 4000 6000
00.4 2000 00.81 0 4000 2000 8000 10000 0 2000 4000 10000
00.26000 8000 10000
0 0.4 0.2 0.80.8 0 4000 6000 10000
0 0 0 0.2 0.2 0.4 0.6
0.4 0.6 0.61 0.810.2 0.4 10.2 0.6
0.4 0.8 0.6
0.2 0.8 1 0.8
0.4
1 1
0.6
0.72 0 Training
0.8 0.2
samples 0.6 Ours 0.8
Training samples 1
Recall Recall
Recall Training samples
Recall Recall TrainingRecall
samples Training samples Training samples
0.4
0.4
Recall 0.4 Recall Recall
0.2
ECSSD
PASCAL-S HKU-IS ECSSD
HKU-IS ECSSD
1 0.8 1
HKU-IS ECSSD
0 0.2 0.4 0.6 0.8 1

ECSSD ECSSD 8000 PASCAL-S PASCAL-S


0.72 0.7
Recall
0.3
0 2000 4000 6000 10000 Fig. 6. 0 Comparison
2000 in terms of F-measure
4000 6000 between
8000 our method
10000 and RFCN
Fig. 5. Effect of each step of the algorithm. Left)
Training samples PR curves on ECSSD Training samples
0.3
dataset. Right)
0.2 PR curves on PASCAL-S 0.3
dataset. The performances are PASCAL-S fine-tuned on different number of training samples, evaluated on ECSSD and
HKU-IS datasets.
compared with three existing methods Recall MCDL [37], DRFI [13] and MR [36].
HKU-IS ECSSD
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

) C(s3) C(s4) C D CD(AVE) CD(IRW*) CD(IRW) MR MCDL DRFI


C(s1), C(s2), C(s3), C(s4): Color feature Saliency Game with four different dataset BL DSR HS MR RC wCO
scales
0.2 of
1 superpixels. C: Color feature Saliency 0.2 Game over scales. D: Deep ECSSD 0.6838 0.70.6618 0.80.6347 0.9 0.6905 1 0.4560 0.6764
feature0 Saliency 0.1
Game 0.2 averaged 0.3over scales.0.40 0.50.1
CD(AVE): 0.60.2
a weighted 0.70.3
sum of 0.80.4 0.90.5
PASCAL-S
10.6
0.5742 0.5575 0.5314 0.5863 0.4039 0.5999
C and D. CD(IRW): 0.7Saliency 0.8 Game 0.9refined by 1 Recall
the Iterative Random Recall
D Walk
0.3 0.4 0.5 0.6
C(s3) C(s4) ) C C(s3)D C(s4)
CD(AVE) C CD(IRW*) MSRA
CD(AVE)
CD(IRW) 0.7840
MR 0.7841 MCDL
CD(IRW*) 0.7671
CD(IRW) 0.8041
DRFI0.5754
MR 0.7937 MCDL D
Recall
with metric cross fusion. SG(IRW*): Saliency GameMR refined by the Iterative
CD(AVE) 0.9 CD(IRW*) CD(IRW) MCDL DRFIHKU-IS 0.6597 0.6774 0.6359 0.6550 0.5008 0.6770
SOD 0.5723 0.5962 0.5212 0.5695 0.4184 0.5987
1Random Walk without metric cross fusion. 1 DUT-OMRON 0.4989 0.5243 0.5108 0.5280 0.4058 0.5277
BSCA DRFI MCDL LEGS KSR Ours
0.8
ECSSD 0.7046 0.7329 0.7959 0.7851 0.7817 0.8215
The proposed saliency game algorithm achieves favorable
• PASCAL-S 0.6006 0.6182 0.6912 - 0.7039 0.7062
performance. Note that even when using only simple MSRA 0.7934 - - - - 0.8666
0.9 HKU-IS 0.6545 0.7219 0.7573 0.7229 0.7468 0.8015
color features, as a 0.9
0.7
fully unsupervised method, our SOD 0.5835 0.6470 0.6772 0.6834 0.6679 0.6896
proposed Saliency Game (C in Figure 5) algorithm is DUT-OMRON 0.5091 0.5505 0.6250 0.5916 0.5911 0.5981
Precision

comparable with supervised methods.


0.6 TABLE I
• The Iterative Random Walk improves performance in F-measure scores. The best and the second best results are shown in red and
0.8 0.8 green, respectively. Supervised methods are marked in bold.
both deep feature space (D in Figure 5) and color
0.5
space CD(IRW). Comparing results of Iterative Random
Walk CD(IRW), Iterative Random Walk without metric
0.7 fusion CD(IRW*), and0.7weighted sum of results in the
0.4
on MSRA-5000 dataset since these methods all randomly
two feature spaces CD(ave), demonstrates the advantage select images from this dataset for training. Further, since
of cross penalization and metric fusion in the Iterative
0.3 LEGS also selects images from PASCAL-S dataset, we do
Precision

Precision

Random Walk. not show its performance over the PASCAL-S dataset. Visual
0.6 In addition, as an unsupervised
0.6 approach, our method is comparison of the proposed method against state-of-the-art on
0.2
economical and practical. Although
0 0.1 0.2 0.3
someRecall
0.4
deep
0.5
learning
0.6
based
0.7
different
0.8
datasets1 is shown in Figure 12, 13, 14, 15.
0.9

methods outperform ours in few cases, a lot of time and


0.5
a large number of training0.5samples are required to assure E. Sensitivity analysis
their effectiveness. Otherwise, performance of these methods
might not be as well as ours. To demonstrate this, we show In this section, we test sensitivity of the proposed Saliency
comparison in terms of F-measure between our method and Game to parameters λ1 , λ2 , α and sensitivity of the Iterative
Random Walk to parameters β and ρ1 . As shown in Figure 7
0.4 RFCN [31] fine-tuned on different
0.4 number of training samples
in Figure 6. RFCN is a recently proposed deep learning based and 8, the performance in terms of F-meansure score almost
method that achieved excellent performance. However, it can keeps the same when varying the parameters a little, so the
be seen from the figure that RFCN does not do well without proposed method is not sensitive to these parameters.
0.3 fine-tuning. Its F-measure increases
0.3 as the number of training
samples grows. Our method is equivalent to RFCN fine-tuned F. Equilibria
on about 5000-9000 images. 0.6
0.3 0.4 0.5 0.7 0.8 The proposed
1 Saliency Game is a special category of games 0.9
0.2 0.2 named polymatrix games [10], where each player plays a two-
Recall
D.
0 Comparison
0.1 with state-of-the-Art
0.2 0 0.3 methods
0.1 0.4 0.2 0.5 player
0.3 0.6game 0.4
against
0.7 each0.5other
0.8 and 0.6
his payoff
0.9 is0.7
then1the sum
0.8 0.9
As is shown in Figure 10, Figure 11, Table I and Table II, of the payoffs from each
Recall of the two-player games [8]. Howson
Recall
our proposed method compares favorably against 11 state-of- et al. [10] showed that every polymatrix game has at least
the-art approaches over six different datasets. Among models, one equilibrium. Therefore, the proposed Saliency Game also
BL, BSCA, DSR, HS, MR, RC, wCO are unsupervised has at least one, but could have more than one equilibria.
methods. DRFI, LEGS, MCDL, KSR are supervised methods. Replicator Dynamics is invoked to find a Nash equilibrium of
DRFI learns a random forest regressor, LEGS and MCDL the game, in which different Nash equilibria might be reached
train a convolutional neural network, KSR learns a classifier if the initial state zi (0) is set to different interior points of ∆.
and a subspace projection to rank object proposals based on Empirically, we find that ∀i ∈ I, zi (0) = (0.5, 0.5) is a good
R-CNN features. For a fair comparison, we do not provide initialization leading to plausible saliency detection. In this
evaluation results of DRFI, LEGS, MCDL, and KSR methods section, we show the saliency detection results corresponding
0.7 0.8 0.7 0.7 0.7 0.7 0.8 0.7 0.7

0.6 SUBMITTED
0.7 0.6 TO IEEE TRANSACTIONS ON
0.6 IMAGE PROCESSING, VOL. XX,
0.6 NO. XX, XX 0.6 0.7 0.6 0.6 8

0.5 0.5 0.5 0.5 0.5 0.5 0.5

F-measure
0.6

F-measure
F-measure

F-measure
0.6
F-measure

F-measure
0.8 0.8 0.8

0.4 0.7 0.4 0.4 0.7 0.4 0.4 0.7


0.4 0.4
0.5 0.5

F-measure
F-measure

0.6 0.6 0.6

0.3 0.3 0.3 0.3 0.3 0.3 0.3


0.5 0.5 0.5
F-measure

F-measure
F-measure
0.4 0.4
0.4 0.4 0.4
0.2 0.2 0.2 0.2 0.2 0.2 0.2
0.3 0.3 0.3
0.3 0.3 Saturday, M
0.1 0.2
0.1 0.1 0.2 0.1 0.1 0.2 0.1 0.1
Saturday, March 11, 2017 9:53 AM Saturday, March 11, 2017 9:53 AM Saturday, March 11, 2017
Saturday, March 11, 2017
9:53 AM
9:53 AM
0.1
(×"#$%
0.1 0.1
Saturday, March 11, 2017 9:53 AM
0.2 0.2
0 0 0 0 0 0 0 0$% )
(×"#
.004 0.009
8 0.005 0.006
0.010
0.001 0.002 0.0070.001 0.008
0.003 0.0030.009
0.004
0.021
0.002 0.005
0.024
0.004 0.010
0.006
0.046
0.005 0.006 0.007
0.068
0.007 0.008
0.008 0.090 0.021
0.009 0.009
0.112
0.010 0.024 0.010
0.1340.046 0.024 0.068
0.021 0.156 0.178
0.046 0.068
0
0.0900.0210.112
0.200
0.090 (×"# $% )0.134
0.112 0.024
0.134 0.046
0.156 0.090
0.156
0.178 0.068
0.200 0.108
0.178
(×"#
0
$% ) 0.090 0.207
0.200
0.090 0.112 $%
0.108 0.306
(×"# 0.134
0.207 0.405
) 0.306 0.090
0.156
0.405 0.504
0.504 0.108
0.603 0.603
0.178 0.207
0.200
0.702 0.702
(×"#
0.801 0.306
$%
0.900 ( ×"# $' ) 0.405
) 0.801 0.900
0.0900.504
( ×"# $'0)
0.108
alpha lambda_2
alpha 0.1 alpha lambda_1 lambda_1
lambda_1
( ×"#$' )
0.1 lambda_1
( ×"#$' )
lambda_2 lambda_2
( ×"#$' ) ( ×"#$' )
Saturday, March 11, 2017 9:53 AM
Fig. 7. Sensitivity of the Saliency Game to parameters α, λ1 and λ2 , evaluated on ECSSD dataset.
0 0
009 0.010 0.021 0.024 0.046 0.068 0.090 0.112 0.134 0.156 0.178 0.200 (×"#$% ) 0.090 0.108 0.207 0.306 0.405 0.504 0.603 0.702
dataset BL DSR HS lambda_1
MR RC wCO lambda_2
ECSSD 0.9143 0.8619 0.8838 0.8827 0.8342 0.8814 use MR [36] model to compute priori , ∀i ∈ I. We try four
( ×"#$' )
PASCAL-S 0.8671 0.8118 0.8362 0.8259 0.8139 0.8482 different initial states to test whether inducing prior knowledge
MSRA 0.9535 0.9382 0.9279 0.9267 0.8951 0.9360
HKU-IS 0.9140 0.9008 0.8782 0.8611 0.8530 0.8952 into the initial state leads to a better saliency detection result.
SOD 0.8503 0.8208 0.8145 0.7903 0.7924 0.8026 Each of the five different Nash equilibria corresponds to a
DUT-OMRON 0.8778 0.8787 0.8586 0.8447 0.8476 0.8846
BSCA DRFI MCDL LEGS KSR Ours saliency detection result. We show the quantitative comparison
ECSSD 0.9176 0.9404 0.9186 0.9235 0.9268 0.9272 of the five different results in terms of F-measure curves and
PASCAL-S 0.8665 0.8950 0.8699 - 0.9012 0.8724
MSRA 0.9484 - - - - 0.9583 PR curves in Figure VI-F. It can be seen that the initial state
HKU-IS 0.9140 0.9435 0.9175 0.9026 0.9099 0.9183 V half without any prior knowledge, which is adopted in the
SOD 0.8358 0.8813
0.9 0.8163 0.8268 0.8403 0.8481
0.9
DUT-OMRON 0.8779 0.9157 0.9014
0.9
0.8841 0.8921 0.8869 paper, leads to the best saliency detection.
0.8
0.8
TABLE II 0.8
1 0.8
AUC scores. Supervised
0.7 methods are marked in bold.
0.7 0.8 0.7 0.8 0.9
0.7
0.8
0.6 0.6 0.6 0.7
0.7 0.6
0.7

F-Measure
F-measure

Precision
F-measure

0.9
F-measure

0.9

0.5 0.5 0.5 0.6 0.6 0.5


0.6
0.8 0.8
half
0.5 V
0.7 0.4
V bd
0.7
0.4 0.4 0.4 0.5
0.5 0.4
F-measure

V pos
F-measure

0.6 0.6
0.3
V obj
F-measure

0.3
F-measure

0.3 0.5
0.3 0.3 0.4
0.4 0.5
V prior
0.2 0.2
0.4 0.4 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1
0.2 0.2 0.2 0.3 Recall Threshold
0.3
0.3 0.3

0.1 0.2
0.2
0.1 0.2
0.1 Fig. 9. Comparison of different saliency detection results corresponding to
0.2
0.1 0.1 five different Nash equilibria (evaluated on ECSSD dataset). V half , V 9:53 AM
Saturday, March 11, 2017
pos ,
0
0.1
0
0
0
0 0.1 V obj , V bd and V prior : Saliency detection results corresponding to five Nash
6 0.7 0.8 0.9
0.1 1.0
0.6 1.1 0.6 0.8
0.7 1.2 1.30.9 1.0
0.9 1.4 1.1 1.2 1.30.1 1.40.2 0.1 0.3 0.40.1 0.50.2 0.6
0.3 0.7
0.4 0.8
0.5 0.9
0.69:53 AM0.7 0.8 0.9
0.1 0.7 0.8 1.0
beta
1.1 1.2 1.3 1.4 0.2 0.3 0.4
rho_1
Saturday, March 11, 2017
0.5 0.6 0.7
equilibria
0.8
reached
0.9
by Replicator Dynamics starting from initial state V half ,
beta beta rho_1 rho_1 V pos , V obj , V bd and V prior , respectively. (×"#$% )
0 0
8 0.009 0.010 0.021 0.024 0.046 0.068 0.090 0.112 0.134 0.156 0.178 0.200 (×"#$% ) 0.090 0.108 0.207 0.306 0.405 0.504 0.603 0.702 0.801 0.900 ( ×"#$' )
Fig. 8. Sensitivity of the Iterativelambda_1
Random Walk to parameters β and ρ2 , lambda_2
( ×"#$' )
evaluated on ECSSD dataset.
VII. S UMMARY AND C ONCLUSION
to other four Nash equilibria, reached by Replicator Dynamics We propose a novel saliency detection algorithm. Firstly, we
starting from four different interior points of ∆. We denote formulate a Saliency Game among superpixels, and a saliency
the initial state used in the paper as V half , and the other four map is generated according to each region’s strategy in the
initial states as V bd , V pos , V obj , and V prior . Each of them Nash equilibrium of the proposed Saliency Game. Secondly,
is a 2 × N matrix, where the i-th column vector (denoted as an iterative random walk that combines a deep feature and
vibd , vipos , viobj , viprior , respectively) corresponds to the mixed a color feature is constructed to refine the saliency maps
strategy of superpixel i. Variables vibd , vipos , viobj and viprior generated in the last step. Extensive experiments over four
are set as follows: benchmark datasets demonstrate that the proposed algorithm
achieves favorable performance against state-of-the-art meth-
(
bd,1 0.4 if i ∈ B
vi = , v bd,0 = 1 − vibd,1 ; (22) ods. The sensitivity analysis shows the robustness of the
0.5 otherwise i
proposed method to parameter changes.
vipos,1 = N · posi (1), vipos,0 = N · posi (0); (23) Different from most previous methods that formulate
saliency detection as minimizing one single energy function,
viobj,1 =N· obji (1), viobj,0 = N · obji (0); (24) the game-theoretic approach can be regarded as maximizing
many competing objective functions simultaneously. The game
viprior,1 = priori , viprior,0 = 1 − priori ; (25)
equilibrium automatically provides a trade-off. This seems
where priori is the saliency value of superpixel i computed very natural for attention modeling and saliency detection, as
by another saliency detection method. In this experiment, we also features and objects compete to capture our attention.
#726
0.8 0.6 0.6 0.6 0.6

Preci

Preci
Preci

Preci
Preci

Preci
0100.7Precision
010
0.5 ICCV 2017 Submission #726. CONFIDENTIAL
0.5
0.5
0.5
REVIEW COPY. DO NOT DISTRIBUTE.
0.5 0.5
0.9
0110.61 IEEE TRANSACTIONS011
Paper ID 726 Paper ID 726
0.4 0.4
SUBMITTED TO0.9 ON IMAGE PROCESSING, VOL. XX, NO. XX, XX 9
0.4 1 1 1
0.4 0.4 0.4
0.3 0.3
0120.90.5 012
0.9 0.9 0.9
0.3 0.3 0.3
0.2 0.2 0.3
756
013 0.4
0.8 BL 0.8BL BSCA DRFI 013
0.8 DSR HS KSR LEGS 0.8 MCDL MR RC wCO 0.8 Ours
1
0.8 BSCA 0.20.1
DRFI DSR 1 HS
0.1 0.2
KSR LEGS MCDL 1 MR
0.2 0.2
RC 1

757
014
1
014
1
0 0.2 0.41 0.6 0.8 1 0 00.7 0.20.2 0.40.4 0.60.6 0.8 0.8 1 1 0 0 0.2 0.2
1
0.4 0.4 0.6 0.6 0.8 0.8 1 1 0 0 0.2 0.2 0.4 0.
1. Performance Comparison 1. Performance Comparison
0.3
0.7 0.9 0.9 0.7 0.7
0.9 0.9
Recall Recall
Precision

Precision
Precision
Precision
0.9 0.6 0.9 Recall 0.9 Recall
Recall 0.9 R
758
0.8
015
1
0.2
0.6
0 0.2
0.8

0.4 0.70.6 0.8 1015 0.8 0.7


0.6
0.8
0.6
0.8

0.8
0.7 0.5 0.8 0.7 0.8 0.7
Recall
759
016 016
Precision

In the paper, we 0.7 evaluate the proposed In the paper, methodweonevaluate 4 benchmark the proposed 0.7 datasets, and compare
method on 4 0.5benchmark performance 0.7 datase of

Precision
Precision
Precision
0.5 0.7 0.6 0.5

ICCV
0.6 0.6 0.6
0.7 0.4

VICCV ICCV
Precision

Precision
Precision
Precision
0.5

760
017 0.4
with 0.5
11 state-of-the-art 017
methods including
with
0.6
BL [8], BSCA
11 state-of-the-art methods
0.4
[7], DRFI including [4], DSR BL [5], [8], HS BSCA [11],[7],
0.4
0.5
LEGS DRFI [9], [4]M 0.5

#726
0.6 0.3

6#726
0.4 0.6 0.6
0.6
#726 ICCV
0.4 0.5
761
018 018
0.4

RCSubmission [1], wCO [14], #726. and KSR2017 [10]. Here weCOPY. compare andour method 0.5 with Herethem on other our twoDISTRIBUTE.datasets: DU
0.4

RC [1], wCO [14], CONFIDENTIAL KSR [10]. we compare method with


0.3 0.3 0.3 0.3
0.5 0.2
ICCV 2017 CONFIDENTIAL REVIEW #726. DO NOT DISTRIBUTE.
0.5
0.3
ICCV 0.4
Submission #726
0.2
REVIEW COPY. DO NOT 0.3 0.3

762
019
0.2
0
SOD [6].
0.2 0.4
0.2 0.4
0 Performance
0.6
0.2
0.6
0.8
0.4
ICCV
1 019
0.1
0
0.6
2017
comparison Submission
0.2 SOD0.3 0.8 0.4in[6].
1 terms #726.
0.6Performance
0.1 CONFIDENTIAL
of0.8 PR curves 1ICCV
0.2
2017
comparison
0 and 0.2 Submission
F-measure
0.4 REVIEW
0.4 in terms 0.6 COPY.
#726.
curvesof
0.8 PRare DO
CONFIDENTIAL
1 curves
0.2
0.2 0 NOT
shown andDISTRIBU
0.2
in F-meas
0.4 0.4 REV
Figure 0.2

O020 NOT DISTRIBUTE. RecallRecall 0.9 Recall Recall


0.90 0.2 0.4 0.6
Recall
0.8 1 0 0.2
Recall
0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
F-Measure

0.5
7631 are0.2 shown in Table 1. 020
0.3
0.9 Visual comparison are shown in
0.2
of Table
0.9
the proposed 1. Visual method
Recall
comparison
0.3
againstofstate-of-the-art
Recall
the proposed method 0.8
on differen
0.3
aga 0.9

7640.9
0.8
021 Figure 7, 8, 9, 10. 021 Figure 7, 8, 9, Recall 10.
0.8 0.1 0.2 0.7 0.2 0.8

05
0 0.2 0.4 0.6 0.8 1
000 Recall
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2

CCV
F-Measure

0.7 0.5 0.7


Recall 0.6
0.7

7650.8
022
1
001 0.9
0.6
0.4 756 022 0.9 1 0.6 0.8 1 0.9
05
1 0.6

#726 BL 0.8 BSCA Ours DRFI1 10.8


ECSSD 810
DSR BL HKU-IS 0.8 BSCA HS KSR DRFI LEGS DSRPASCAL-S MCDLHS 0.5
MR
KSR RC
LEG
F-Measure

F-Measure

F-Measure
F-Measure

MR 1 0.8 RC wCO 0.5


766
0.9
023
0.5
023
0.9 0.5 0.9 0.8 0.9

757ICCV 20170.7Submission
10.7 1 1 1
002 0.7 1 0.8 1
811 #726. CONFIDENTIAL
0.8
0.8
0.4
0.80.8 05
0.90.60.9REVIEW COPY. DO NOT DISTRIBUTE.
0.4 0.8 0.4 0.4
Precision

0.8 0.8 0.7 0.8


0.7
767
0.9
024 0.4 758 024
0.9 0.9 0.3
0.9 0.9
0.3 0.9 0.3
05
003 0.3
0.9 0.7 0.7
0.9 0.7 0.7 0.3

812
0.7 0.6 0.6 0.7 0.7 0.7 0.7
0.6 0.2 0.6

Precision
Precision
Precision

pervised Game-Theoretic Approach to Salient Approach


Object Detection
0.2 0.8 0.2 0.80.5
768
An Unsupervised Game-Theoretic to Salient Object
0.2

025
0.6
025
F-Measure

0.8 0.8 0.60.8 0.8 0.8

F-Measure
0.8

F-Measure
F-Measure

05
0.8 0.6 0.6
004
0.6

759
0.5 0.5 0.6 0.50.6 0.6

813 0.7
0.5
0.1 0.1 0.6 0.1 0.1
0.7 0.7 0.5
0.70.4
769
026
0.5

1756
005
0.7 0.7
0.4 0
0261
0.7 0.40.50.7 0 0.7
0.5 0.5 0 0.5
0.4 0.7 0.7
05
0.5 0
F-Measure
Precision

F-Measure
Precision

Supplementary Material

F-Measure
0.4
Precision

Precision
Precision
Precision

Precision
Precision
Precision

Precision
Precision
0 0.2 0.4 0.6 0.8
0.4 1

Supplementary Material
0 0.2 0.4 0.6 0.8 1
814
0 0.2 0.4 0.6 0.8 1

760
1 0.2 1 0.5 0.5 0 0.2 0.4 0.6 0.8 1
0.4 ThresholdBL 0.8 BSCA 1 DRFI 0.6 Threshold DSR HS 1
KSR
Threshold 0.60.3 LEGS 0.4 MCDL Threshold MR 1 RC 0.4
770 0.6 0.3
0.3
027
0.6 0.6
0.3 027
0.40.6
0.3 0.4
0.6 0.4
0.3 0.6 0.6

9757 06
0.3 1 1 1 1
006
0.9 0.9 0.9 0.50.2
815 0.9 0.9 0.4
0.50.2 0.4
761
0.3 0.3 0.9 0.3
0.2 0.2 0.2
0.5 0.30.5 0.3 0.3
771
028
0.5 0.9
028
0.5 0.9 0.9 0.5 0.9 0.5 0.5
06
8758
0.2
1007 0
0.2
0.8 0.2
0.4 0.1
816 0.6
0.2 0.1 0.4 0.6 0.8 0.8
1 0.1 0.4 0.1
0.3 0.2
0.10.3
0 0.2 0.4
0.1 0.6 0.8 0.8 1
0.8 0.8 0 0.2 0.4 0.6 0.8 1
0.4
0.8 Recall 0.6 762
0.8
0.20.4 0 0.2 0.4 0.6 0.8
0.2 1 0 0.2 0.8
0.4 0.2 0.6 0.8 1

772 0.4 Recall Recall 0.4 0.8 0.8


029
Recall
029
Recall 0.4 0.4 0.4
759
7 008
0.7 0
0 0.2 0.4 0.7 0.6
0.2
0.7 0.8 1
0.3 0
0.7 0
817 Threshold
0.2 0.4 0.6 0.8
0.7 0.7
1
0.3 0
0.2
0.10 0.2 0.4 0.6 0.8 0.7 1
0
0.2 06
Anonymous ICCV
0.10.3

submission
0.3 0.7 0.70.1 0 0.2

Anonymous ICCV submission


0.7
763
0.7
773 Threshold

Precision
0.3 0.3
Precision
Precision

030 030

Precision
Precision
Precision

Threshold

Precision
Precision
Precision

0.2 0.3 0.3 0.3


760
009
0.6
0.2 0 ECSSD
0.20.6
06
HKU-IS PASCAL-S
0.6
0.2
0.6 00.2
818 0.60.1 0 0.6 0 0.1
ICCV 0.8 0.6
6 0.8 0.21 0 0.2 0.4
0.4 0.6
0.6 0.6
0.80.8 11 0 0 0.20.2 0.40.4 0.60.6 0.6
0.8 11 00 0.2
0.2 0.4
0.4 0.60.6 0.80.8 0.6 1 1 0 0 0.20
774
0.2
031
761
010
0.5 0 0.2 0.4
0 Recall
0.6
Recall 0.1764 031
0.8
0.5 0.2 1 0.10
0.5
00.3
0.2
0.2 0.4
0.4 Threshold 0.6
Recall
0.5 0.6 0.80.6 1 0.1 0.2
10.7 0.8 Threshold
0.4 0.9
Recall 0.6 0.6 1
0.2 0.2
0 0 06
0.5
Recall SOD #726
0.4
819 ThresholdRecall
0.8
SODof PR curvesRecall
0.5 0 0 0.2 0.2 0.4
DUT-OMRON
0.8 0.8 1 1
0.5
F-Measure

0.5
F-Measure

0.9
Figure 0.5 0.5 0.5
0.9 7. Quantitative SOD comparisons in terms and F-measure curves.
5 0.5 0.4 Recall Recall 0.5
775
032 MSRA 765 032 DUT-OMRON
762 0.6
Paper ID 726 Paper ID 726 06
0.4
011
0.4 0.9
0.4
0.1 0.8 0.8 0.9
820
0.9 0.8 0.4 0.4
0.9 0.9
O4 0.3NOT DISTRIBUTE. 0.4
0.3
0.4 0.4
776
0.4
033
763
012 Fig. 0.3
0.7 10. Quantitative comparison
766 033
of models
0.3 in terms 0.8 of PR curves. 0.8
0.3 0.7 dataset BL DSR 0.3 HS 0.7 0.8 06
0.3 Figure 1. Quantity 821 comparison in terms
Figure of 1. PRQuantitycurves and F-measure
comparison in0.3curves.
terms of PR cu
0.8 0.8 0.2

777
3 0.2 0.3 0.3 ECSSD 0.9143 0.8619 0.8838 0
034 034
0.2
764
0.6 0.2 0.7 0.7
013 0.206
0.7 1 0.7 0.1 0.2 0.2 0.7
767
0 0.2 0.4 0.6 0.8
00.6 0 0.6
BL BSCA DRFI DSR 0
HS 0.2822 KSR
dataset
0.4
BL
0.6
LEGS DSR 0.8
MCDL1
HS PASCAL-S
MR0.2 MR 0.4
dataset RC 0.8671 0.6
RC
BL wCO 0.8 0.8118
wCO
DSR Ours1
0.8362
HS 0
F-Measure

0.8 0.4 0
Recall 0.4
F-Measure

0.5
778 Recall Recall
2 0.1 0.5 0.6 0.2 0.2
035
0.6
1035
1 0.1 0.2 1 1 0.2 1
765
0.6
014 MR0.2 0.2 0.4
0 0
0.6
0.4 0.4 0.6 0.6 0.8 0.8 1 0
0.6
0.2 0.9 0.2 0.4SOD 0.4
810 0.6 0.6 0.5723 0.8 0.8 1 0.59621 0 0 0.5212
0.8 0.5 0.2 SOD MSRA 0.5695
0.4 0.9535
0.4184
0.6 0.60.5723 0.8 0.8 0.9382
0.5987
0.5962 1 0 0.9279
0.5212
0.9 0.2 060.0
Comparison
0 0.2 0.4 1 0.5
0.5 1. Performance Comparison
0768Ours 0.1 0.50.5 823 0.3
0.9
F-Measure

F-Measure F-Measure

0.2 0.4 0.5 0.6 0.7 0.8 0.9


F-Measure

F-Measure
F-Measure
F-Measure

F-Measure

0.9
0.9
RCRecall
Recall wCO HKU-IS Recall 0.9140 0.9
0.9008 0.8782
0.9

779
036
766 036
0.8DUT-OMRON Recall Recall 0.4989 0.5243 0.5108
0.7 DUT-OMRON
0.5 Recall
0.5280 0.4058
0.4989 0.5277
0.5243 0.5108 0.5
00
06
0.8
015 0.81
0.30.8
811
824
0.8
Threshold 0.4
BSCA
0.8
DRFI MCDL
0.4
L
0.8

769 037 0.40.7 BSCA DRFI MCDL


0.4 LEGS KSR
BSCA DRFI
Ours MCDL L
780
0.4 0.7 0.4 0.7 0.7 0.4
037
0.7
767
016
0.9 0.6
07
0.7

aluate the 0.3 proposed


0.6 In the paper, method 0.3 weon 4 benchmark
evaluate the0.30.3proposed datasets, methodand compare on 4 benchmark performance ECSSD
datasets, of the and proposed
comparemethod
0.9176 0.9404performance 0.9186 of 0
0.7 0.7
0.2
SOD 812 0.5835 0.6470 0.6772 SOD 0.6834 0.6679
0.5835 0.6896
0.6470 0.6772 0
Precision

Precision
Precision
Precision

0.3
0.4 0.6
825 0.6 0.3

770 038
0.3 0.6
DUT-OMRON 0.3
0.3
781 PASCAL-S 0.8665 0.8950 0.8699
0.6

V
0.6 0.6

038
768
0.8
0.5091 0.5505 0.6250
0.5
DUT-OMRON 0.5916 0.5911
0.5091 0.5981
0.5505 0.6250 0
F-Measure

07
0.1
017
rt methods 0.2 with including11 state-of-the-artBL [8], BSCA methods [7],0.20.2DRFI including [4], DSRBL [8], [5], BSCA HS [11], [7],LEGS DRFI [9],
[4], MCDL
DSR [5], [13], HSMR [11], - [12],LEGS- [9], M

F-Measure
813
F-Measure

0.5
0.5 0.2 0.2
0.50.5
826 0.2
MSRA 0.9484 0.5 0.5
0.2 0.5

6782 771
0.70 0.4 0.4
039
769 0.20.6
039 07
Precision

nd 018 KSR 0.1[10]. Here we compare our method with them on other two datasets: DUT-OMRON
HKU-IS 0.9140 [12]
0.9435 and 0.9175 0
0.8 1
RC [1], wCO [14], and KSR [10]. Here 814 we compare our method with them on other two datasets: inDU
0.4 0
0.4 0.4 0.6 0.8 1 0.4 0.4 0.4 0.4
0.1 0.1 0.1 0.1
Table 1. F-measure scores. The best Tableand the
1. 827
second
F-measure best results
scores. The are shown
best and in thered and
second green,
best respectively.
results are shown Supervi r
0.1
0.1 0.3
0.6 Threshold 0.3

-S 783
CCV
040
770
019 2017 Submission
0.3
Input
0.3
GT Ours
MSRA#726.
KSR 0.3 LEGS 772 CONFIDENTIAL
ICCV
MCDL 040 2017 DRFI 0 0 Submission
BSCA 0.3
BL
REVIEW DSR MR
COPY.
#726. HS RC
DO
CONFIDENTIAL NOT
wCO
DISTRIBUTE.
0 0 REVIEW COPY. DO NOT DISTRIBU
0.3 0.3
07
0.3

ce comparison 0SOD in
[6].terms Performance of0.6PR curves
0.8 comparison 1and 0 0 F-measure in 8280.40.4curves
terms of PR 0.6 are 0.8 shown
curves and in 0 Figure
F-measure 0.2 0.21. curves F-measure 0.6are scores
shown in
1 1 Figure
0.2

0.20.2815
0
in bold. in bold.
0 0
0.5 0.2 0.4 1 1 0.2
Table 2. AUC scores. Supervised method
0.2 0.6
784
0.2 0
0.20.9
0.2 0.4 0.6 0.2 0.8 1 0.1
0.2 0.8 0 0.4 0.4 0.6 0.2 0.8 0.8 0.2 0 0 0.2

041
771 041
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

020
.-measure
Visual curves. comparison
are shown
RecallThreshold
Figure inofTable 8. Visual
the 773
proposed
1. Visual comparison methodcomparison of against
saliency
Threshold
maps.
829 ofstate-of-the-art
816 the proposed method
Threshold
on0.1different against datasets
state-of-the-art
Threshold
Threshold
is shown on in0.1different 07
F-Measure

0.5 Recall Recall Recall

ECSSD HKU-IS
ECSSD PASCAL-S
HKU-IS
0.4
0.1 0.1 0.9 0.8 0.9

785
0.9

772
042 042 methods. DRFI, LEGS, MCDL, KSR
DSR 2. Equilibria 2.0 Equilibria
021 Figure 0
7,
0.2 8, 9, 0.4 10. 0.6 774 [28] 817 0.8
07
our method and RFCN wCOfine-tuned on
830different number
0.30.8 0 0 0.7 0 0.8
0 0.8 1

786 HS MR 0.2 RC 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2

7. ods. DRFI learns a random forest


0.7
773
043
6 022
Threshold
043
0.7
0.7
Threshold Threshold
818 ofRC07
0.6

Figure 7. Quantitative comparisons Figure inMRterms ofLEGS


Quantitative PR curves and F-measure
comparisons in
MRterms curves. P
0.2
0.8619
0.8
DRFI of
1 0.8838
training
0
0.6 0.20.8827
DSR0.4
BL samples
ECSSD0.4
ECSSD
0.10.8 0.8342
BSCA
HS775
0.6
in Figure 0.8814
0.8
DRFI
KSR 1
6. RFCN LEGS 831
DSR is
HKU-IS
0.6
a0.1
HKU-IS recentlyHS
MCDL pro- KSR RC PASCAL-S
PASCAL-S wCOMCDL Ours 0.6

787 It can be inferred that the proposed It can Saliency


be inferred Game that isthe a special
proposed MCDL category train
Saliency aof games
convolutional
Game isnameda special polym
neural c
0.5
Recall
774
044 044
F-Measure

0.8118 1 0.8362 0.8259 0.8139 0.8482


F-Measure

F-Measure
F-Measure

7 023 1 1 1
819 0.5 1 1 1
07 0.5

0.9382 0.9posed
0.5

0.9279 deep learning 776based method that832 achieved excellent


0.9
0.8 1 0.8
0.8 0.8 0.4 0.8
dataset BL
788
0.9008 polymatrix
0.9267
0.9 game,
0.8951
1 each045 player0.9360
Figure plays a two-player
7.0.70.7Quantitative
polymatrix game, comparisonsgame
each player against
in terms plays 0.7 a
each ofclassifier
aother
0.9PR curves
two-player and andand isaF-measure
subspace
payoff
game isDSR
against projection
then
curves. the
0.7each
0.9HS d
sumot
0.4
775
0.4 0.4

8045
0.9 0.9
024 0.8 0.9 0.8 820
0.9
07
performance. However, it can be seen 833 from the figure that
0.3
0.8782
0.9
0.7
0.8611 0.1 0.8530 0.8952 0.3 0.7
ECSSD 0.9143 0.8619 0.8838 E
777
0.3
0.3 0.3

789
776
046025 DRFI each 0.80.8 0.70.8
MCDL of the LEGS two-player KSR 046 games Ours [2]. each
0.8
0.8
Howson
of 821the et
two-player
0.2
al. showed games in [3]
[2]. 0.6 alsHowson
0.6 that
0.8
based
dataset
0.8
every onet R-CNN
polymatrix
BLal. showed0.2
DSRfeatures.
game in has
[3]
HS For that
0.8
ata07 fM
l
PASCAL-S 0.8671 0.8118 0.8362
0.2
PAS
0.6 0.6 0.6 0.2

9790 0.9404 RFCN


0.7 0.9186
0.7 0.60.1 does 0.9235 not 0.70 do0.9268 well without
778 0.9272 0.7 fine-tuning.
0.6
834 Its 0 F-measure ECSSD 0.9143
0.1
0.8619 0.8838 0.8
but0.50.5not provide evaluation results butof DRF
0.1

Therefore, the proposed047 Saliency Game822


Therefore, also thehas at least
proposed one,
Saliency could
Game have
also more
has at than least one
one, equilibria. could
0.1
777
047026 0.7
0.5 0.50.7 0.50.7 0.7 MSRA 0.9535 0.9382 0.5
0.9279
0.7
08M
F-Measure

0.8950 increases as- the00.9012 number of training0.6 0.5 0.2 samples 835 0.3 grows. Our
F-Measure
Precision

F-Measure
F-Measure

F-Measure
Precision

Precision
Precision
Precision
Precision

0.60.8699 0.8724
Precision
Precision
Precision
Precision

0791
0.5
0
0 0.2 0.4 0.1
0.6 0.8 1
0
0 0.4 0.1 0.2
PASCAL-S 0.6 0.3 0.8671 0.70.8118
0.4
0
0.8
0.8362 0.5 0.8 0

HKU-IS 0.9140 0.9008 0.8782 H


0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
0.2 0 0.2 0.4 0.6 0.8 1

778
048027 - is0.5invoked
0.6 0.6
0.4
0.4 - to- find a0 0779
Threshold
Nash
- 048 equilibrium
0.9583 0.5 is 0.4invoked
0.4
Threshold
0.6
of 823 thetogame, find ainNash which different
equilibrium
Threshold
Threshold
0.4
KSR
0.6
MSRA
0.6
of methods
Nash the equilibria
game, 0.9535 oninMSRA-5000
might
0.9382 be
which
Threshold
reached
different 0.4
0.9279
Threshold datase
0.6
Na 08if
0.9
1049
792 0.9435 method
0.9175 is equivalent
0.9026 0.9099 to RFCN 0.1
0.9183 fine-tuned
0.3 0.2 836 on 0.3 about 0.4 5000- 0.50.4
HKU-IS
0.6
0.9140 BSCA
0.7
0.9008 DRFI
0.8 0.3 MCDL
0.8782
0.9
0.8
779 is0.4 0.3set to different interior 780 049points0.4 isof0.20.3set .toEmpirically,
different interior we find pointsthat of 8iall2 .randomly
Empirically, =select weDRFI images
find is
that afromgood 2thi in
0.5
0.3
028
0.5
0.5
824
0.3
0.5 0.5
I,
ECSSD z i (0)BSCA (0.5,
0.9176 0.5) 0.9404 8i0.9186
0.5
IE
08
2050
793
9000 0.4 images. 0.6 837 Threshold 0.3
MCDL 0.2 LE
plausible dataset saliency detection. bold.InHSthe plausible supplementary saliency material,
RC detection. we
In the test Further,
saliency
supplementary since detectionLEGS material, also
results selects
we correspon
test ima
sali
0.2 0.2
780 0.4 0.2 0.1
s.029 781DSR in050 825 PASCAL-S 0.8665 0.8950 0.8699 PAS 08
0.4 0.4 0.4 0.4
Supervised 0.3
methods are BL marked 0.3
MR wCO ECSSD 0.9176 0.9404 0.9186 0.9
794
0.10.3
0.1 0.2
0.1
838 0.2
0.1
MSRA 0.9484 -
0.1
0.3 - M
3051
781
030 equilibria,
0.3
0.2 ECSSD which are
0.6838
782 reached
0.6618051 byequilibria,
0.6347 0.2 Replicator
0.3 0.6905826which Dynamics
0.4560 are reached startingbyfrom
0.6764 dataset,
Replicator
0.3
four different
0.3 PASCAL-S
weDynamicsdo notinterior
0.8665 show
0.8950
starting its
points performan
0.8699
from of fou 08 .
EGS,
795
0.8
MCDL, 1
00
0.2
PASCAL-S KSR are supervised
0.5742 0.60.5575 0.8meth- 0.5314
0.1 0
0.5863 839 0.4039 0.5999
0.10
MSRA HKU-IS 0.4 0.9484 0.9140 - 0.9435 - 0
0 0.9175 H
SV dataset. Several saliency maps V1 are
000 0.2 0.4 0.6 0.8 11
1 state used in the paper as ,state
1 and the
used other
inKSR the four paper initial Vstates
as0.60.8DRFI 1 ,as and the ,V other ,Recall
four ,0.6V
0.4 initial . Each
states as of 0 th ,
782
00 0.2 0.4 0.6 0.8 1 0.2
0.8 0.2
1 0.2
0.2 0.4
0.4 0.6 0.8
half half 0.2 0.2bd
0 0.2
pos 0.4
obj0.6
0.6
prior
0.8
0.8 11
0.2 bd
4052
0.2 0.4 0.6 0.8 1
031 0.1
0 0.2 0 0.4 Threshold 0.6
Recall 052V
0.8 1 0.1 0.2
827 Threshold 0 HKU-IS V
0.9140 0.9435 0.9175 08
0.9
783
0.8 Threshold
0 MSRA
0 0.7840 0.6 0.7841 0.7671 BL0 0.8041 MR0.5754 0.7937
Threshold
s-S a randomInputSOD
forest
GT Ours 0.2
KSR
MSRA
regressor,
0.4
LEGS0.1
Recall MCDL
LEGS DRFI0.8
and
0.2 BSCAInput SOD 0
0.3 GT
0.2
DSR
Ours840 0.2
0.4
0.4 Recall
0.40.5
HS
LEGS
0.6
MCDL0.6
RC DUT-OMRON
wCO 0.8
BSCA 0.7 1 BL 0 0.8
0.2
DSR 0.2MR0.9 HS
0.4
RC 1 0.6DUT-OMRON
0.8
wCO 0.8
1
F-Measure

796 0.5 pos obj prior pos obj prior


783
5053 032 where
Input
0.90.9
HKU-IS GT the i-th0.6597
Ours KSR column
MSRARecallLEGS 0.6774
MCDL vector0.6359
053 DRFI (denoted
BSCA where
0.9 BL the
0.6550 as i-th
828
DSR bd
, column
v Threshold
MR 0.5008v , 0.6770
Recall
SOD HSRecall RC v ivector
wCO , v i (denoted ,visual
0.8 respectively)
Table ascomparison
2.vAUC i , of
i , vcorresponds
DUT-OMRON
bd RecallRecall
scores. v iour , vmethod
Supervised i to mixed vers
, 0.9Tabl
resp
met 08
784 841 i KSRi
0.8 0.9
nvolutional
797
784 neural Figure network,BSCA
8. Visual KSR
DRFI
comparison learnsMCDL 0.8 of saliency
LEGS
Figure
829 8. maps. Visual0.8215 Ours
comparison0.7of0.8saliency maps. Table 2. AUC scores. Supervised methods
6-measure curves.
033 Fig. 11. Quantitative comparison of models in terms of F-measure.
0.8
Figure 8. Visual comparison of saliency maps. 0.8 08
ECSSD 0.7046 0.7329 Figure
0.7959of0.7 1. Quantity
0.7851 comparison
0.7817 in terms of PR curves and F-measure curves.
0.8
ubspace Figure
projection 1. Quantity to rankcomparison object
785 in
propos- terms PR 842
curves and F-measure curves.
798
785 methods.
methods. DRFI, DRFI, LEGS, LEGS, MCDL, MCDL, KSRmeth K
HS7. Summary RCand Conclusion
034 0.70.7PASCAL-S 0.6006 0.6182 0.6912 0.7
830
- 0.7039 0.7062 08
our method andMR RFCN RC [28] fine-tuned
our method on different
and RFCN number
[28] fine-tuned on MRdifferent number
0.7
7 features. DSR HS wCO 0.7
our method and RFCN [28] fine-tuned on different number 1 1
0.6
843
0.6
NN 799
786
035 0.8619 0.6 0.8838 0.6 For
MSRA a
dataset fair
0.8827 comparison,
0.7934
0.4 BL
0.8342786 0.8814 - DSR we do - 0.6 HS dataset -
831
MR - BLRC 0.8666 DSRwCO
0.5 ods.0.5695
ods. DRFI DRFI learns learns
wCO
a 0.5987 a random
random forest ods.
0.6
for r
0.5 SOD 0.5723 0.5962
0.5987 0.5212
0.6
0.4184 08
8tion of 0.5training
results of training
HKU-IS
of DRFI, samples
SOD 0.8259 samples
0.6545
LEGS,
0.5723
inMCDL,
in Figure
0.7219 0.5962
Figure of 6.6.0.5training
0.7573
and RFCN
0.5212 RFCN 0.7229 844 is recently
samples
0.5695a0.7468
recently in Figure
0.4184 0.8015
pro- 6. RFCN
pro- is a recently pro-
ure

0.8118 0.50.8362 0.8139 0.8482


ure

re
re
re

re

re

0.5 0.5
SUBMITTED TO IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. XX, NO. XX, XX 10

In addition, compared with CNN based saliency detection [23] D. A. Miller and S. W. Zucker. Copositive-plus lemke algorithm solves
methods which need to be trained on images with pixel- polymatrix games. Operations Research Letters, 10(5):285–290, 1991.
[24] V. Movahedi and J. H. Elder. Design and perceptual validation of
level masks as ground truth, the proposed method extracts performance measures for salient object segmentation. In Computer
features from a pre-trained CNN and combines them with Vision and Pattern Recognition Workshops, pages 49–56, 2010.
[25] Y. Qin, H. Lu, Y. Xu, and H. Wang. Saliency detection via cellular
color features in an unsupervised manner. This provides an automata. In Proceedings of IEEE Conference on Computer Vision and
efficient complement to CNNs that does on par with models Pattern Recognition, 2015.
that need labeled training data. Hopefully, our approach will [26] P. D. Taylor and L. B. Jonker. Evolutionarily stable strategies and game
dynamics. Journal of Theoretical Biology, 40(1-2):145–156, 1978.
encourage future models that can utilize both labeled and [27] N. Tong, H. Lu, X. Ruan, and M.-H. Yang. Salient object detection via
unlabeled data. bootstrap learning. In Proceedings of IEEE Conference on Computer
Vision and Pattern Recognition, pages 1884–1892, 2015.
[28] A. Torsello, S. R. Bulo, and M. Pelillo. Grouping with asymmetric
ACKNOWLEDGMENT affinities: A game-theoretic perspective. In Proceedings of IEEE Con-
ference on Computer Vision and Pattern Recognition, pages 292–299,
The authors would like to thank... 2006.
[29] Z. Tu, Z. H. Zhou, W. Wang, J. Jiang, and B. Wang. Unsupervised
R EFERENCES metric fusion by cross diffusion. In Proceedings of IEEE Conference
on Computer Vision and Pattern Recognition, pages 2997–3004, 2012.
[1] R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk. Frequency- [30] L. Wang, H. Lu, R. Xiang, and M. H. Yang. Deep networks for saliency
tuned salient region detection. In Proceedings of IEEE Conference on detection via local estimation and global search. In Proceedings of IEEE
Computer Vision and Pattern Recognition, pages 1597–1604, 2009. Conference on Computer Vision and Pattern Recognition, pages 3183–
[2] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk. 3192, 2015.
Slic superpixels. 2010. [31] L. Wang, L. Wang, H. Lu, P. Zhang, and R. Xiang. Saliency detection
[3] A. Albarelli, B. S. Rota, A. Torsello, and M. Pelillo. Matching as a non- with recurrent fully convolutional networks. In Proceedings of European
cooperative game. In Proceedings of the IEEE International Conference Conference on Computer Vision, 2016.
on Computer Vision, pages 1319–1326, 2009. [32] Q. Wang, W. Zheng, and R. Piramuthu. Grab: Visual saliency via novel
[4] A. Borji, M.-M. Cheng, H. Jiang, and J. Li. Salient object detection: graph model and background priors. In Proceedings of IEEE Conference
A benchmark. IEEE Transactions on Image Processing, 24(12):5706– on Computer Vision and Pattern Recognition, 2016.
5722, 2015. [33] T. Wang, L. Zhang, H. Lu, C. Sun, and J. Qi. Kernelized subspace
[5] A. Borji and L. Itti. State-of-the-art in visual attention modeling. IEEE ranking for saliency detection. In Proceedings of European Conference
Trans. Pattern Anal. Mach. Intell., 35(1):185–207, 2013. on Computer Vision, 2016.
[6] A. Chakraborty and J. S. Duncan. Game-theoretic integration for image [34] J. W. Weibull. Eolutionary Game Theory. MIT Press,, 1999.
segmentation. IEEE Transactions on Pattern Analysis and Machine [35] Q. Yan, L. Xu, J. Shi, and J. Jia. Hierarchical saliency detection.
Intelligence, 21(1):12–30, 1999. In Proceedings of IEEE Conference on Computer Vision and Pattern
[7] M.-M. Cheng, N. J. Mitra, X. Huang, P. H. Torr, and S.-M. Hu. Global Recognition, pages 1155–1162, 2013.
contrast based salient region detection. IEEE Transactions on Pattern [36] C. Yang, L. Zhang, H. Lu, X. Ruan, and M.-H. Yang. Saliency detection
Analysis and Machine Intelligence, 37(3):569–582, 2015. via graph-based manifold ranking. In Proceedings of IEEE Conference
[8] A. Deligkas, J. Fearnley, T. P. Igwe, and R. Savani. An empirical study on Computer Vision and Pattern Recognition, pages 3166–3173, 2013.
on computing equilibria in polymatrix games. 2016. [37] R. Zhao, W. Ouyang, H. Li, and X. Wang. Saliency detection by multi-
[9] A. Erdem and M. Pelillo. Graph transduction as a non-cooperative game. context deep learning. In Proceedings of IEEE Conference on Computer
Neural Computation, 24(3):700–723, 2012. Vision and Pattern Recognition, pages 1265–1274, 2015.
[10] J. Howson, Joseph T. Equilibria of polymatrix games. Management [38] W. Zhu, S. Liang, Y. Wei, and J. Sun. Saliency optimization from robust
Science, 18(5-Part-1):312–318, 1972. background detection. In Proceedings of IEEE Conference on Computer
[11] R. A. Hummel and S. W. Zucker. On the foundations of relaxation Vision and Pattern Recognition, pages 2814–2821, 2014.
labeling processes. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 5(3):267–287, 1983.
[12] B. Jiang, L. Zhang, H. Lu, C. Yang, and M. H. Yang. Saliency detection
via absorbing markov chain. In Proceedings of the IEEE International
Conference on Computer Vision, pages 1665–1672, 2013.
[13] H. Jiang, J. Wang, Z. Yuan, Y. Wu, N. Zheng, and S. Li. Salient
object detection: A discriminative regional feature integration approach. Yu Zeng received the B.S. degree in Electronic
In Proceedings of IEEE Conference on Computer Vision and Pattern and Information Engineering from Dalian University
Recognition, pages 2083–2090, 2013. of Technology (DUT), Dalian, China, in 2017. Her
[14] Y. Kong, L. Wang, X. Liu, H. Lu, and R. Xiang. Pattern mining saliency. current research interests include image processing
In Proceedings of European Conference on Computer Vision, 2016. and computer vision with focus on image statistics
[15] P. Krähenbühl and V. Koltun. Geodesic object proposals. In Proceedings and saliency detection.
of European Conference on Computer Vision, pages 725–739, 2014.
[16] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning
applied to document recognition. Proceedings of the IEEE, 86(11):2278–
2324, 1998.
[17] G. Lee, Y.-W. Tai, and J. Kim. Deep saliency with encoded low level
distance map and high level features. In Proceedings of IEEE Conference
on Computer Vision and Pattern Recognition, 2016.
[18] G. Li and Y. Yu. Visual saliency based on multiscale deep features.
In Proceedings of IEEE Conference on Computer Vision and Pattern
Recognition, pages 5455–5463, 2015.
[19] X. Li, H. Lu, L. Zhang, X. Ruan, and M.-H. Yang. Saliency detection Mengyang Feng received B.E. degree in Electrical
via dense and sparse reconstruction. In Proceedings of the IEEE and Information Engineering from Dalian University
International Conference on Computer Vision, pages 2976–2983, 2013. of Technology in 2015. He is currently pursuing the
[20] Y. Li, X. Hou, C. Koch, J. M. Rehg, and A. L. Yuille. The secrets Ph.D. degree under the supervision of Prof. H. Lu
of salient object segmentation. In Proceedings of IEEE Conference on in Dalian University of Technology. His research
Computer Vision and Pattern Recognition, pages 280 – 287, 2014. interest focuses on salient object detection.
[21] T. Liu, J. Sun, N. N. Zheng, X. Tang, and H. Y. Shum. Learning to detect
a salient object. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 33(2):353–67, 2011.
[22] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks
for semantic segmentation. In Proceedings of IEEE Conference on
Computer Vision and Pattern Recognition, pages 1337–1342, 2015.
0108
0078
SUBMITTED TO IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. XX, NO. XX, XX 11
0126

0194
0079

0209

0090
0234

0247
0175

0113
0253

0165
0336

name Images GT Ours KSR LEGS MCDL DRFI BSCA BL DSR MR


0537 Fig. 12. Qualitative comparison of our methods and state-of-the-art methods on ECSSD dataset.

0543

0594

0962

file:///home/xyz/cvpr17/result/main_ecssd.html 1/1
1449

1652

name Images GT Ours KSR LEGS MCDL DRFI BSCA BL DSR MR


Fig. 13. Qualitative comparison of our methods and state-of-the-art methods on HKU-IS dataset.

Huchuan Lu received the Ph.D. degree in System Ali Borji received the B.S. degree in Computer
Engineering and the M.S. degree in Signal and Engineering from the Petroleum University of Tech-
Information Processing from Dalian University of nology, Tehran, Iran, in 2001, the M.S. degree in
Technology (DUT), Dalian, China, in 2008 and computer engineering from Shiraz University, Shi-
1998, respectively. He joined the faculty in 1998 raz, Iran, in 2004, and the Ph.D. degree in cognitive
and currently is a Full Professor of the School of neurosciences from the Institute for Studies in Fun-
Information and Communication Engineering, DUT. damental Sciences, Tehran, Iran, in 2009. He spent
His current research interests include computer vi- four years as a Post-Doctoral Scholar with iLab,
sion and pattern recognition with focus on visual University of Southern California, from 2010 to
tracking, saliency detection, and segmentation. He 2014. He is currently an Assistant Professor with the
is a member of the ACM and an Associate Editor University of Wisconsin, Milwaukee. His research
of the IEEE Transactions on Cybernetics. interests include visual attention, active learning, object and scene recognition,
and cognitive and computational neurosciences.

file:///home/xyz/cvpr17/result/main.html 1/1
344
3/19/2017 main_MSRA.html
SUBMITTED TO IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. XX, NO. XX, XX 12

0_0_77 354

394

0_0_272
492

566
0_0_280

596
0_0_284

703

0_2_2310
727

0_3_3434
773

name Images GT Ours KSR MCDL DRFI BSCA BL DSR MR HS


0_7_7586 Fig. 14. Qualitative comparison of our methods and state-of-the-art methods on PASCAL-S dataset.

0_7_7677

0_9_9837

0_9_9938

0_10_10545
file:///home/xyz/cvpr17/result/main_pascals.html 1/1

1_25_25389

0_18_18734

0_25_25057

name Images GT Ours BSCA BL DSR MR HS RC wCO


Fig. 15. Qualitative comparison of our methods and state-of-the-art methods on MSRA dataset.

You might also like