Professional Documents
Culture Documents
Information Influence and Its Application in Social Networks
Information Influence and Its Application in Social Networks
friends,
fans,
followers,
etc.
and perform actions
comment, link,
rate, like,
repost, send photos or vids
etc.
Social network
• What’s social network?
• A graph consists of individuals and
their relationships
• It plays a fundamental role as a
medium the spread of information
• For example, new cell phone
promotion, political movement.
• Viral marketing strategy
• Convincing a subset of individuals Which set of individuals should we target?
to adopt a new product to trigger
a large cascade of further
adoptions
Influence Maximization
inactive user:
not adopt the product yet
inactive user:
not adopt the product yet
initial adopter:
paid by the merchant social network
initial adopter:
paid by the merchant
subsequent adopter:
influenced by friends social network
initial adopter:
paid by the merchant
subsequent adopter:
influenced by friends social network
#( ) + #( )
influence spread
social network
• 𝑘-seed set
• A set of individuals (i.e., nodes) who adopt a new product and initiate the
influence propagation
• Influence spread of a seed set ,
• The no. users that adopt the information by
• In-order-to estimate , a model that describes how influence spread is needed
Preliminary
• Diffusion model
• A model that describes how influence propagates over the network
• Linear threshold (LT) model
• Independent cascade (IC) model
• Common assumptions
• A node can have either of two states, active or inactive
• Active user: A user who adopts the information
• Inactive user: A user who does not adopts the information
• Inactive nodes can become active state
• But active nodes cannot become inactive state
Linear Threshold (LT) Model
• Diffusion process in the LT model
• Each node chooses a threshold uniformly at random [0,1]
• A node become active state if the total weight of its active neighbors is more
than its threshold
Y Inactive Node
0.6
Active Node
X Active neighbors
0.1
0.4 U
0.5 0.3
0.2 Stop!
0.5
w v
Independent Cascade (IC) Model
• Diffusion process in the IC model
• A new active node is given a single chance to active its inactive neighbor
nodes
• The activation attempts are independence each other
0.6 Y
Inactive Node
Stop!
0.3 0.2 0.2
Active Node
Newly active
X 0.1 U
0.4 node
Successful
0.5 0.3 attempt
0.2
Unsuccessful
0.5 attempt
w
v
Estimating influence spread,
• Randomness of diffusion models
• The influence spread of a seed set can change,
• As each node’s threshold in LT model
• As activation attempts of new active nodes in IC model
• Monte-Carlo (MC) simulations
• Obtaining the final result by averaging result of maybe over 10,000
simulations
• The process of estimating using MC-simulations is as follows
• Let the nodes in be active state
• The influence from is propagated according to the given diffusion model
• The influence spread of is estimated by averaging the result of maybe over 10,000
simulations.
Optimal solution to IM problem
• It is needed to compare all possible -seed sets to find the optimal
solution
• is the number of nodes in the network
• Finding the optimal solution to IM is -Hard
Submodularity for approximation
• Try to identify broad subclasses where good approximation is possible
• Influence function – = expected no. active nodes at the end process
starting with .
• is submodular if for , ,
• is monotone if for ,
• Reduction gives that is note submodular
• Theorem [Nemhauser et al. 1978]: monotone and submodular, optimal
-element subset, obtain by greedily adding elements that maximize
marginal increase; then .
Solutions for IM
• Approximate Algorithms
• Greedy
• Some improve greedy algos: CELF/CELF++, NewGreedy, …
• Heuristic Algorithms
Greedy Algorithm
Algo: Greedy optimization • For any monotone and submodular
Input: Graph set function with , the greedy
Output: Maximum influence set algorithm has an approximation
1: initialize ; ratio (~63%), where is the output
2: for to do of the greedy algorithm and is the
3: select
optimal solution.
• The influence spread of seed set by
4: Greedy is considered the ground
5: end for
truth.
6: return • 15,000 nodes takes a few days to
complete
CELF Algorithm
• An improvement to the greedy algorithm already implemented
• Uses the submodularity property – when adding into , the
incremental influence spread as the result of adding is larger if is
smaller
• Same influence spread, 700 times faster
CELF Algorithm
• Greedy and CELF return the
same solution .
• Learn more:
https://colab.research.google.com/drive/1vTLAqj7
7-lPUoFjdNQwDhm1Cg9uJcrTz?usp=sharing
NewGreedyIC Alogrithm
• For IC construct a graph G’
• Obtain G’ by removing all edges not for propagation from G with
probability 1-p
• Use DFS/BFS to find out the set of vertices reachable from S in G’
• With 15-34% shorter runtime
NewGreedyIC Alogrithm
• Time complexity:
DegreeDiscountIC
• Consider edge , with in the seed set and being considered
• Since is in the seed set, that neighbor should not be counted towards
’s degree
• Same for all of ’s neighbors in the seed set
• Let be the subgraph with and all of its neighbors, only with edges
from to it neighbors
• is number of neighbors of already in seed set
DegreeDiscountIC
• Time complexity:
• Heuristics run in milliseconds
and achieve nearly same
influence spread
• Datasets
Experiments
Wiki- Cit-
Dataset NetHEPT CitHepTh
Votes HepPh
No. Nodes 15233 7115 34546 27770