Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 41

Information Influence and Its

Application in Social Networks


Project No. 201
Members

Hồ Phương Thảo – 2001206988

Biện Thanh Nhựt – 2001207176

Đỗ Thế Sang - 2001203004


Agenda
• An introduction to Social Networks.
• Influence Maximization (IM)
• IM Problem
• Viral marketing
• Linear Threshold Model
• Independent Cascade Model
• Submodularity
• Solutions for IM
• Experiments
• Conclusion and Future Works
Online Social Networking Sites
Information Propagation
• Nodes represent characters and
edges represent conversational
interactions
• Sims, M. and Bamman, D., 2020. Measuring information propagation in literary social
networks. arXiv preprint arXiv:2004.13980.
Information Propagation
People are connected

friends,
fans,
followers,
etc.
and perform actions

comment, link,
rate, like,
repost, send photos or vids
etc.
Social network
• What’s social network?
• A graph consists of individuals and
their relationships
• It plays a fundamental role as a
medium the spread of information
• For example, new cell phone
promotion, political movement.
• Viral marketing strategy
• Convincing a subset of individuals Which set of individuals should we target?
to adopt a new product to trigger
a large cascade of further
adoptions
Influence Maximization

inactive user:
not adopt the product yet

A followee can influence a follower


social network

Motivated by viral marketing in social networks


Influence Maximization

inactive user:
not adopt the product yet

initial adopter:
paid by the merchant social network

Motivated by viral marketing in social networks


Pay individual
Influence Maximization

initial adopter:
paid by the merchant

subsequent adopter:
influenced by friends social network

Motivated by viral marketing in social networks


Hope word-of-mouth promotes the given product
Influence Maximization

initial adopter:
paid by the merchant

subsequent adopter:
influenced by friends social network

Motivated by viral marketing in social networks


Hope word-of-mouth promotes the given product
create a cascade of influence
Influence Maximization

#( ) + #( )
influence spread

social network

Problem: for a fixed , how to pick individuals for the


merchant such that the eventual influence spread is
maximized?
Influence Maximization
• Given a graph and budget ,
• To find the -seed set that maximizes the influence spread over

• 𝑘-seed set
• A set of individuals (i.e., nodes) who adopt a new product and initiate the
influence propagation
• Influence spread of a seed set ,
• The no. users that adopt the information by
• In-order-to estimate , a model that describes how influence spread is needed
Preliminary
• Diffusion model
• A model that describes how influence propagates over the network
• Linear threshold (LT) model
• Independent cascade (IC) model
• Common assumptions
• A node can have either of two states, active or inactive
• Active user: A user who adopts the information
• Inactive user: A user who does not adopts the information
• Inactive nodes can become active state
• But active nodes cannot become inactive state
Linear Threshold (LT) Model
• Diffusion process in the LT model
• Each node chooses a threshold uniformly at random [0,1]
• A node become active state if the total weight of its active neighbors is more
than its threshold
Y Inactive Node
0.6
Active Node

0.3 0.2 0.2 Threshold

X Active neighbors
0.1
0.4 U

0.5 0.3
0.2 Stop!
0.5
w v
Independent Cascade (IC) Model
• Diffusion process in the IC model
• A new active node is given a single chance to active its inactive neighbor
nodes
• The activation attempts are independence each other
0.6 Y

Inactive Node
Stop!
0.3 0.2 0.2
Active Node

Newly active
X 0.1 U
0.4 node
Successful
0.5 0.3 attempt
0.2
Unsuccessful
0.5 attempt
w
v
Estimating influence spread,
• Randomness of diffusion models
• The influence spread of a seed set can change,
• As each node’s threshold in LT model
• As activation attempts of new active nodes in IC model
• Monte-Carlo (MC) simulations
• Obtaining the final result by averaging result of maybe over 10,000
simulations
• The process of estimating using MC-simulations is as follows
• Let the nodes in be active state
• The influence from is propagated according to the given diffusion model
• The influence spread of is estimated by averaging the result of maybe over 10,000
simulations.
Optimal solution to IM problem
• It is needed to compare all possible -seed sets to find the optimal
solution
• is the number of nodes in the network
• Finding the optimal solution to IM is -Hard
Submodularity for approximation
• Try to identify broad subclasses where good approximation is possible
• Influence function – = expected no. active nodes at the end process
starting with .
• is submodular if for , ,

• is monotone if for ,
• Reduction gives that is note submodular
• Theorem [Nemhauser et al. 1978]: monotone and submodular, optimal
-element subset, obtain by greedily adding elements that maximize
marginal increase; then .
Solutions for IM
• Approximate Algorithms
• Greedy
• Some improve greedy algos: CELF/CELF++, NewGreedy, …
• Heuristic Algorithms
Greedy Algorithm
Algo: Greedy optimization • For any monotone and submodular
Input: Graph set function with , the greedy
Output: Maximum influence set algorithm has an approximation
1: initialize ; ratio (~63%), where is the output
2: for to do of the greedy algorithm and is the
3: select
optimal solution.
• The influence spread of seed set by
4: Greedy is considered the ground
5: end for
truth.
6: return • 15,000 nodes takes a few days to
complete
CELF Algorithm
• An improvement to the greedy algorithm already implemented
• Uses the submodularity property – when adding into , the
incremental influence spread as the result of adding is larger if is
smaller
• Same influence spread, 700 times faster
CELF Algorithm
• Greedy and CELF return the
same solution .
• Learn more:
https://colab.research.google.com/drive/1vTLAqj7
7-lPUoFjdNQwDhm1Cg9uJcrTz?usp=sharing
NewGreedyIC Alogrithm
• For IC construct a graph G’
• Obtain G’ by removing all edges not for propagation from G with
probability 1-p
• Use DFS/BFS to find out the set of vertices reachable from S in G’
• With 15-34% shorter runtime
NewGreedyIC Alogrithm
• Time complexity:
DegreeDiscountIC
• Consider edge , with in the seed set and being considered
• Since is in the seed set, that neighbor should not be counted towards
’s degree
• Same for all of ’s neighbors in the seed set
• Let be the subgraph with and all of its neighbors, only with edges
from to it neighbors
• is number of neighbors of already in seed set
DegreeDiscountIC
• Time complexity:
• Heuristics run in milliseconds
and achieve nearly same
influence spread
• Datasets
Experiments

Wiki- Cit-
Dataset NetHEPT CitHepTh
Votes HepPh
No. Nodes 15233 7115 34546 27770

No. Edges 58891 103689 521578 352807


• Setup
Experiments

• Conduct according to the evaluation of spreading efficiency and runtime on IC and LT


models. Implement using imporved algorithms of Greedy, including the following heuristic
algorithms in the Python programming language.
• CELF: The original greedy algorithm with submodular improvements. We choose – the
number of Monte Carlo simulations for this algorithm is 1500.
• NewGreedyIC: A new greedy algorithm proposed for the IC model (Algorithm 2), with and
propagation probability .
• DegreeHeuristic: A heuristic algorithm applying a priority queue with and .
• DegreeDiscountIC: Applying a heuristic strategy according to Algorithm 4 with same params
Result
• NetHEPT
Result
• NetHEPT
Result
• Cit-HepTh
Result
• Cit-HepTh
Result
• CA-HepTh
Result
• CA-HepTh
Result
• Wiki-Votes
Result
• Wiki-Votes
Result
• Improved algorithms like DegreeDiscount, NewGreedy, and Degree
Heuristic show faster linear runtimes compared to CELF.
• CELF excels in network spread on datasets like Cit-HepTh and Ca-
HepTh, but other algorithms sometimes surpass CELF in spread with
nearly linear runtimes. To maximize spread without time constraints,
consider NewGreedy, DegreeDiscount, and Degree Heuristic,
gradually increasing R. However, for high-node datasets, CELF's slow
runtime makes it less efficient.
Conclusion
• We studied about IM, to find its solution, and its application in viral
marketing.
• With an abundance of existing studies on this topic, we had to catch
up.
Future works

• Applying Reinforcement Learning to


Enhance Influence Maximization
• Integration of Deep Learning for
Improved Influence Prediction
• Dynamic and Adaptive Strategies for
Real-time Influence Maximization
• …
Thank you!

You might also like