Professional Documents
Culture Documents
IM
IM
Definition: The influence maximization (IM) problem asks for a set of k nodes that
maximize the expected spread of influence in the network. The set of these initial k is
called the seed set.
Mathematically, given:
a ground set V (the set of users in the social network),
information diffusion model,
+¿¿
k ∈ Z : global cardinality constraint,
S ∈V and |S|≤ k : the set of users that need to be determined,
f ( S ): influence function measuring the expected no. users in V that can be affected by
members in S under the given info diffusion model.
The above problem finds that:
max f ( S )
S ∈V ,|S|≤ k
∑ bv , w ≤ 1(1)
w neighbor of v
A node v becomes active when at least θ v fraction of its neighbors is active. That
is
∑ b v ,w ≥θ v (2)
w active neighbor of v
Red-colored nodes a and d are active. The two green areas enclose the nodes
activated by a and d respectively, i.e., X a and X d.
Note:
It is clear that f (S ) is the size of the union of X u : f (S )=¿ ∪u ∈ S X u∨¿ .
Set S is more influential if f (S ) is larger.
As mentioned before, the influence maximization problem is then an optimal problem:
max f ( S )
S ∈V ,|S|≤ k
This problem is NP-Hard [Keme et al. 2003]. However, there is a greedy approximation
algorithm-Hill Climbing that gives a solution S with the following approximation
guarantee:
( 1e ) f (OPT )
f (S )≥ 1−
Live-edge
Hardness of IM
IM under both IC and LT models are NP Hard
IC model: reduced from k -max cover problem.
LT model: reduced from vertex cover problem.
Need approximation algos.
Optimizing submodular functions
Set coverage:
Each entry u is a set of some base elements,
Coverage f ( S )=|U u ∈ S |
u
2.2: S=S ∪ {u }
3: end for
4: return S
Property of the greedy algorithm
Theorem1: if set function f is monotone and submodular with f ( ∅ )=0, then the greedy
algo achieves (1−1 /e) approximation ratio, that is, the solution S found by the greedy
algo satisfies:
( 1e )
f ( S ) ≥ 1− max f ( S ' )
S' ⊆ V , |S'|=k
Pr(set A is activated given seed set S) = Pr(set A is reachable from S in random live-edge
graph)
1
Nemhauser, Wolsey and Fisher, Mathematical Programming, 1978
Active node set via IC diffusion process. Yellow node set is the active node
set after the diffusion process in the independent cascade model.
Random live-edge graph in the IC model
Each edge is independently selected as live with its propagation probability.
The yellow node set is the active node set reachable from the seed set in a
random live edge graph.
Equivalence is straightforward.
The yellow node set is the active node set after the diffusion process in the
linear threshold model.
Random live-edge graph in the LT model
Each node selects at most one incoming edge, with probability proportional to its
weight.
The yellow node set is the active node set reachable from the seed set in a
random live-edge graph.
Equivalence is based on uniform threshold selection from [0,1], and linear weight
addition.
We have, sub-modularity of |R ( ∙ , GL )|
for any S ⊆ T ⊆ V , v ∈ V ¿ ,
o if u is reachable from v but not from T , then u is reachable from v but not
from S
∴|R ( ∙ ,G L )| is submodular.
2
Mossel & Roch, STOC 2007