Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

Introduction

Definition: The influence maximization (IM) problem asks for a set of k nodes that
maximize the expected spread of influence in the network. The set of these initial k is
called the seed set.
Mathematically, given:
 a ground set V (the set of users in the social network),
 information diffusion model,
+¿¿
 k ∈ Z : global cardinality constraint,
 S ∈V and |S|≤ k : the set of users that need to be determined,

f ( S ): influence function measuring the expected no. users in V that can be affected by
members in S under the given info diffusion model.
The above problem finds that:
max f ( S )
S ∈V ,|S|≤ k

An example using of IM is “viral marketing”  uses existing social networks to spread


and promote a product.
The key question is how to find the most influential set of nodes? To answer this
question, we will first look at two classical cascade models:
 Linear Threshold Model
 Independent Cascade Model

Linear Threshold Model


In the Linear Threshold Model, we have the following set up:
 A node v has random threshold θ v U [0 , 1]
 A node v influenced by each neighbor w according to a weight b v ,w ≤1, such that:

∑ bv , w ≤ 1(1)
w neighbor of v

 A node v becomes active when at least θ v fraction of its neighbors is active. That
is

∑ b v ,w ≥θ v (2)
w active neighbor of v

This following figure demonstrates the process:


Explain:
(A) Node V is activated (satisfy the conditions (1)) and influences W and U by 0.5 and
0.2, respectively.
(B) W becomes activated and influences X and U by 0.5 and 0.3, respectively.
(C) U becomes activated and influences X and Y by 0.1 and 0.2 respectively.
(D) X becomes activated and influences Y by 0.2; by now, no more nodes can be
activated process stops.

Independent Cascade Model


In this model, we mode the influences (activation) of nodes based on probability in
directed graph:
 Given a directed graph G=(V , E)
 Given a node set S starts with a new behavior (e.g., adopted new product and we
say they are active)
 Each edge (v , w) has probability pvw
 If node v become active, it gets one chance to make w active with probability pvw
 Activation spread through the network.
Note that:
 Each edge fires only once
 If u and v are both active and link to w , it does not matter which tries to active w
first.
Some definitions:
 Most influential Set of size k (k is a user-defined parameter) is a set S containing
k nodes that if activated, produces the largest expected cascade size f (S ).
 Influence set X u of node u is the set of nodes that will be eventually activated by
node u. An example is shown below.

Red-colored nodes a and d are active. The two green areas enclose the nodes
activated by a and d respectively, i.e., X a and X d.
Note:
 It is clear that f (S ) is the size of the union of X u : f (S )=¿ ∪u ∈ S X u∨¿ .
 Set S is more influential if f (S ) is larger.
As mentioned before, the influence maximization problem is then an optimal problem:
max f ( S )
S ∈V ,|S|≤ k

This problem is NP-Hard [Keme et al. 2003]. However, there is a greedy approximation
algorithm-Hill Climbing that gives a solution S with the following approximation
guarantee:

( 1e ) f (OPT )
f (S )≥ 1−

Where OPT is the globally optimal solution.


Formal way to independent cascade model can be given as:
 A node w once activated at step t , has one chance to activate each of its
neighbors randomly.
 For a neighbor node, say v . The activation succeeds with probability pwv (e.g.,
p=0.5)
 If the activation succeeds, then v will become active at step t+ 1.
 In the subsequent rounds, w will not attempt to activate v anymore.
 The diffusion process starts with an initial activation set of nodes, then continues
until no further activation is possible.
Note: probability p is the propagation probability, if a vertex has L neighbors in the seed

( ( ) ) chance of being included in the next round.


L
1
set then it has a 1−
p

Live-edge
Hardness of IM
IM under both IC and LT models are NP Hard
 IC model: reduced from k -max cover problem.
 LT model: reduced from vertex cover problem.
Need approximation algos.
Optimizing submodular functions

Submodularity of set functions f :2V → R


 for all S ⊆ T ⊆ V , all v ∈ V ¿ ,
 f ( S ∪ { v } )−f ( S ) ≥ f ( T ∪ { v } ) −f ( T )
 diminishing marginal return an equivalent from: for all S , T ⊆ V
 f ( S ∪ T ) +f ( S ∩T ) ≤ f ( S ) + f (T )

Monotonicity of set functions f : for all S ⊆ T ⊆ V , f ( S ) ≤ f (T ). This means that adding


more elements to a set cannot decrease its value.
Example of submodular function and its maximization problem

Set coverage:
 Each entry u is a set of some base elements,
 Coverage f ( S )=|U u ∈ S |
u

 f ( S ∪ { v } )−f (S): additional coverage of v on top of S


k -max cover problem

 Find k subsets that maximizes their total coverage,


 NP-Hard
 Special case of IM problem in IC model
Greedy algo for submodular function maximization
Algo: Greedy optimization
Input: Graph G(V , E),k
Output: Maximum influence set S
1: initialize S= ∅ ;
2: for i=1 to k do
2.1: select
u=argma x w ∈V ¿ [f ( S ∪ { w } ) −f (S)]

2.2: S=S ∪ {u }

3: end for
4: return S
Property of the greedy algorithm
Theorem1: if set function f is monotone and submodular with f ( ∅ )=0, then the greedy
algo achieves (1−1 /e) approximation ratio, that is, the solution S found by the greedy
algo satisfies:

( 1e )
f ( S ) ≥ 1− max f ( S ' )
S' ⊆ V , |S'|=k

 S ’ is the k-element set achieving maximal f .


Submodularity of influence diffusion models
Based on equivalent live-edge graphs:

Pr(set A is activated given seed set S) = Pr(set A is reachable from S in random live-edge
graph)

1
Nemhauser, Wolsey and Fisher, Mathematical Programming, 1978
Active node set via IC diffusion process. Yellow node set is the active node
set after the diffusion process in the independent cascade model.
Random live-edge graph in the IC model
 Each edge is independently selected as live with its propagation probability.
 The yellow node set is the active node set reachable from the seed set in a
random live edge graph.
 Equivalence is straightforward.
The yellow node set is the active node set after the diffusion process in the
linear threshold model.
Random live-edge graph in the LT model
 Each node selects at most one incoming edge, with probability proportional to its
weight.
 The yellow node set is the active node set reachable from the seed set in a
random live-edge graph.
 Equivalence is based on uniform threshold selection from [0,1], and linear weight
addition.

Influence spread of seed set S , σ ( S):


σ ( S ) =∑ Pr ⁡(G L )∨R ( S ,G L ) ∨¿
GL

 G L : a random live-edge graph


 Pr ⁡(G L ): probability of G L being generated
 R ( S , GL ): set of nodes reachable from S in G L
To prove that σ ( S ) is submodular, we only need to show that ¿ R ( ∙ ,G L| is submodular for
any G L (sub-modularity is maintained through linear combinations with non-negative
coefficients)

We have, sub-modularity of |R ( ∙ , GL )|

 for any S ⊆ T ⊆ V , v ∈ V ¿ ,
o if u is reachable from v but not from T , then u is reachable from v but not
from S
 ∴|R ( ∙ ,G L )| is submodular.

Influence spread σ (S ) is submodular in both IC and LT models.


Submodularity in the general threshold model
Theorem2:
 In the general threshold model (see [Kempe, Kleinberg and Tardos, KDD 2003]),
o If for every v ∈ V , f v (∙) is monotone and submodular with f ( ∅ )=0,
o And the reward function r (∙) is monotone and submodular,
 Then the general influence spread function σ (∙) is monotone and submodular.
Local submodularity implies global submodularity.

2
Mossel & Roch, STOC 2007

You might also like