Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

IEEE INFOCOM 2017 - IEEE Conference on Computer Communications

Viral Marketing with Positive Influence


Zhao Zhang Yishuo Shi James Willson, Ding-Zhu Du, Guangmo Tong
Department of Computer Science College of Mathematics Department of Computer Science
Zhejiang Normal University and System Sciences University of Texas at Dallas
Jinhua, Zhejiang, 321004, China Xinjiang University Richardson, TX, 75080, USA
Email: hxhzz@sina.com Urumqi, Xinjiang, 830046, China

Abstract—One model for viral marketing is the positive influ- aim to select the minimum number of seed users such that at
ence. In this model, an inactive node is changed into active if and least p · n users will be activated where n is total number of
only if at least half of its neighbors are already in active state. users. Call this problem as a partial influence seeding problem.
The positive influence model can be viewed as a special case of a
general threshold model, in which the threshold function at each The diffusion mode varies between different influence mod-
node has value one if at least a certain fraction of neighbors els. Two basic models, independent cascade model and linear
are in active state, and value 0 otherwise. This function can be threshold model, are first proposed by Kempe et al. [14].
proved to be monotonically increasing and nonsubmodular for The independent cascade model assumes that each active user
any predefined fraction. Therefore, given a seed set, the number has a certain probability to activate his/her neighbor, while
of influenced nodes is not submodular with respect to the size
of the seed set. This fact makes those optimization problems in the linear threshold model a user becomes active when
related with positive influence very hard, including the minimum the influence from its neighbors accumulates over a certain
partial positive influence seeding problem: Given a social network threshold. In previous works, thresholds of users are assumed
G = (V, E) and a number 0 < p < 1, find a minimum seed set to be randomly determined. Under such an assumption, the
S which can positively influence at least p|V | nodes. objective function of the influence maximization problem
In this paper, we present an O((log n)2 H(dpne))-
approximation algorithm for the minimum partial positive is monotone increasing and submodular, and therefore, the
influence seeding problem, where n is the number of nodes, and classic greedy algorithm can provide an approximation with
H(·) is the Harmonic number. a guaranteed performance ratio. The question is: what if
Keywords: viral marketing, partial positive-influence seeding thresholds are known in advance? In fact, hardness results
problem, approximation algorithm. in [3] show that the problem becomes extremely difficult if
thresholds are prefixed. In this paper, we consider the threshold
I. I NTRODUCTION
model in which an user u is active if and only if at least ru
With the development of Internet and social networks, of its neighbors are in the seed set, and the goal is to find the
viral marketing has been one of the most important product minimum number of seed users such that at least p fraction
promotion methods. In the process of viral marketing, ideas of users are active. Name such a problem as Partial Positive
and innovations are carried by influence cascades which spread Influence Seeding (PPIS) problem. It can be seen that when
between users via fast communications such as e-mails and threshold is fixed, the objective function is not submodular.
online social networks including Facebook. The study of viral Therefore those existing techniques are not applicable to the
marketing can be tracked back to Domingos et al. [8]. In the PPIS problem. Furthermore, prior works [6], [22], [21] mainly
seminal work of Kempe et al. [14], the well-known influence focus on the case when p = 1 which is a special case of the
maximization problem is proposed. Following [14], a huge problem considered in this paper.
body of works have been done regarding the topic of influence One may notice that PPIS is closely related with the classic
diffusion. set cover problem. Given an element set E, a collection of
In most popular models of viral marketing, an influence subsets S ⊆ 2E , and a cost cS on each set S ∈ S, the classic
cascade is triggered by a set of seed users who are selected to set cover problem (SC) asks to find a sub-collection F ⊆ S
be initially activated and will then influence other users. As an to cover all elements, where an element e is covered by a
example, in order to advertise a new product, a company would set S if e ∈ S, and e is covered by sub-collection F if e is
like to offer free samples to a set of initial users who will covered by at least one set of F. There are a lot of variations
potentially introduce the new product to their friends. Due to of SC, including the partial set cover problem (PSC) and the
expense issues, only a limited number of samples are available set multi-cover problem (SMC). In a PSC problem, only a
and thus the company has to seasonably select seed users. certain number of elements are required to be covered. In a
From the view of optimization, one should either select the SMC problem each element e is required to be covered at least
minimum number of users such that the influence can reach a specific number re of times. Combining these two problems
a certain degree, or select a fixed number of users such that together, we have the partial set multi-cover problem (PSMC),
the influence can be maximized. In this paper, we consider the the goal of which is to fully cover at least a certain number
former one. To be more precise, given a ratio 0 < p < 1, we of elements, where an element e is fully covered by a sub-

978-1-5090-5336-0/17/$31.00 ©2017 IEEE


IEEE INFOCOM 2017 - IEEE Conference on Computer Communications

collection F if e belongs to at least re sets of F. It can be and SMC, matching those classic results for PSC and SMC;
easily seen that the PPIS problem is a special case of the but for PSMC, the ratio jumps abruptly to O(n) even when
PSMC problem, viewing each node u to be a set Su covering p is smaller than 1 by a very small amount. It seems that the
all its neighbors, as well as an element which is required to PSMC problem is very challenging.
be covered at least ru times.
There are a lot of studies on PSC and SMC, achieving B. Our Contribution and Techniques
performance ratios matching those best possible ones for the In this paper, we study the PPIS problem through the study
classic set cover problem. However, study on the combination on PSMC. The following results are obtained in this paper.
of these two problems is very rare. According to recent
studies [17], [18], this problem seems to be extremely difficult, • A new problem called minimum density sub-collection
frustrating those methods which have won triumph on PSC and (MDSC) is defined, which is also NP-hard. We show
SMC. that when the maximum covering requirement rmax
is a constant, an α-approximation for MDSC can be
A. Related Work used to produce an α(H(dpne) + 1)-approximation for
The minimum set cover problem (SC) was one of the first PSMC, no matter whether it is a constrained version
21 problems proved to be NP-hard in Karp’s seminal paper or an unbounded version. By designing an O((log n)2 )-
[12]. In fact, Feige [9] proved that it cannot be approximated approximation for constrained MDSC, we obtained per-
within factor ρ ln n for any 0 < ρ < 1 unless N P ⊆ formance ratio O((log n)2 (H(dpne))+1) for constrained
DT IM E(nO(log log n) ), where n is the number of elements. PSMC.
On the other hand, greedy strategy achieves performance ratio • Our performance ratio significantly improves those in
H(∆) ≈ ln ∆ + 0.57 [4], [11], [16], where ∆ is the maximum [17] and [18]. Notice that the performance ratio obtained
2
cardinality of a set and H(∆) = 1 + 21 + · · · + ∆ 1
is the in [17] is meaningful only when (1 − p)δmax /δmin < 1
Harmonic number. (which is satisfied, in many applications, only when p
Dobson proposed the minimum set multi-cover problem in is very close to 1), and the performance ratio is very
[7] and gave an H(K)-approximation, where K is the sum of large even when it is meaningful. While our performance
all the covering requirements. ratio is valid without any restriction on p. The ∆-
For the minimum partial set cover problem, Kearns [13] approximation for PSMC obtained in [18] also uses a
gave a greedy algorithm achieving performance ratio 2H(n)+ greedy idea, but in a much simpler form. Notice that in
3, where n is the size of the ground set. By modifying the a worst case, ∆ can be as large as n − 1. While our ratio
greedy algorithm a little, Slavik [20] improved the perfor- is polynomial in ln n.
mance ratio to H(min{dpne, ∆}), where p is the percentage The overall idea of our algorithms is greedy. However, using
that elements are required to be covered. Gandhi et al. [10] a classic greedy algorithm cannot achieve a good approxima-
proposed a primal-dual algorithm achieving performance ratio tion as indicated by previous studies in [17] and [18]. We
f , where f is the maximum frequency of an element, that propose a new algorithm in this paper, it consists of two stages.
is, each element is contained in at most f sets. Bar-Yuhuda In the first stage, the algorithm iteratively picks most efficient
[1] studied a generalized version in which each element sub-collections until at least dpne elements are fully covered.
has a profit and the total profit of covered elements should The obstacle to obtaining a good approximation factor at this
exceed a threshold. Using local ratio method, he also obtained stage lies in the last iteration: the sub-collection chosen in the
performance ratio f . Konemann et al. [15] presented a La- last iteration might cover much more elements than required,
grangian relaxation framework and obtained performance ratio although at a low density, a lot of cost might be wasted
( 34 + )H(∆) for the generalized partial set cover problem. on covering excessive elements. So, in the second stage, we
It can be seen from the above results that both SMC and further prune the last sub-collection by using greedy strategy
PSC have achieved performance ratios which match those and guess method.
best ratios for the classic set cover problem. The situation A crucial stepping-stone to the above solutions is the MDSC
is significantly different for the partial set multi-cover prob- problem, which is also NP-hard, and we failed to conquer it
lem. In [17], Ran et al. studied the PPIS problem through using a natural LP formulation using the language of sets.
the study of the partial set mulit-cover problem, achieving However, we manage to formulate the problem as a linear
performance ratio γH(δmax ), where γ = 1/(1 − (1 − p)η), program using a language similar to “flow” and made use of
2
η ≈ δmax /δmin , and δmax , δmin are the maximum degree and an approximation algorithm for the minimum node-weighted
the minimum degree of the graph, respectively. For power- Steiner network problem as a subroutine.
law graphs, they showed that their algorithm has a constant The paper is organized as follows. In Section II, we give
performance ratio. In [18], Ran et al. presented a simple formal definitions of those problems studied in this paper
greedy algorithm achieving performance ratio ∆. They also and prove the NP-hardness of MDSC. In Section III, we
presented a local ratio algorithm for PSMC, which reveals show how an α-approximation algorithm for MDSC leads to
a “shock wave” phenomenon for the combination of partial an αH(dpne)-approximation for PSMC. In Section IV, we
cover and multi-cover: their performance ratio is f for PSC present an O((log n)2 )-approximation algorithm for bounded
IEEE INFOCOM 2017 - IEEE Conference on Computer Communications

MDSC. In Section V, the paper is concluded with some Y ∪ Z ∪ {u0 } and S = {{x, y, z, u0 } : (x, y, z) ∈ T }. The
discussions on future work. covering requirement ru0 = k and ru = 1 for u ∈ X ∪ Y ∪ Z.
The cost c(S) = 1 for all S ∈ S. Next, we show that there
II. P RELIMINARIES
is a perfect 3-dimensional matching if and only if the optimal
A. Formal Definition of Problem value for the MDSC problem is k/(3k + 1).
Definition II.1 (Partial Positive Influence Seeding (PPIS)). In fact, if T 0 is a perfect 3-dimensional matching, then
Given a graph G with node set V and edge set E, two real S = {{x, y, z, u0 } : (x, y, z) ∈ T 0 } has |S 0 | = |T 0 | = k
0

numbers 0 < p ≤ 1 (called total influence threshold) and and |C(S 0 )| = 3k + 1. Suppose the instance does not
0 < µ ≤ 1 (called local influence threshold), find a subset of have a perfect 3-dimensional matching. Consider an arbi-
nodes C ⊆ V (called seed set) such that at least p|V | nodes trary sub-collection S 00 and its corresponding subset T 00 =
are influence by C, where a node v is influenced by C if v has {(x, y, z) : {x, y, z, u0 } ∈ S 00 }. Then |T 00 | ≤ |S 00 | = c(S 00 ).
at least dµ · deg(v)e neighbors in C and deg(v) is the degree If |T 00 | > k, then c(S 0 )/|C(S 0 )| > k/(3k + 1). If |T 00 | < k,
of node v in G. then u0 6∈ C(S 00 ) and thus |C(S 00 )| ≤ 3|T 00 |. In this case,
c(S 00 )/|C(S 00 )| ≥ |T 00 |/(3|T 00 |) > k/(3k + 1).
PPIS is a special case of the following partial set multi-cover
E
problem. Suppose E is S an element set, S ⊆ 2 is a collection III. A PPROXIMATION A LGORITHM FOR PSMC
of subsets of E with S∈S S = E, each set S ∈ S has a cost
Assuming an α-approximation algorithm for MDSC, we
cS , each element e ∈ E has a positive covering requirement re .
shall make use of this algorithm to design approximation
For a sub-collection S 0 ⊆ S, denote by Se0 = {S ∈ S 0 : e ∈ S}
algorithms for PSMC. Approximation algorithms for MDSC
those sets of S 0 containing element e. Use e ∼ S 0 to denote
0 will be studied in Section IV.
that element e is fully covered by SP , that is, |Se0 | ≥ re . The
It should be mentioned that since the algorithm is iterative,
cost of sub-collection S 0 is c(S 0 ) = S∈S 0 c(S).
the set C(S 0 ) only refers to those elements effectively fully
Definition II.2 (Partial Set Multi-Cover (PSMC)). Given covered by S 0 . That is, for an element e which belongs to a
E, S, c, r and a nonnegative integer P , the PSMC problem is to set of S 0 , if e is already fully covered in previous iterations, or
find a minimum cost sub-collection S 0 such that |{e ∈ E : e ∼ if the covering requirement of e is still not fulfilled after S 0 is
S 0 }| ≥ P . An instance of PSMC is denoted as (E, S, c, r, P ). added, then e is not counted as an element fully covered by S 0 .
In other words, C(S 0 ) can be interpreted as the set of elements
Given a graph G on node set V , taking S = {N (v) : v ∈
whose covering requirements are fulfilled when sub-collection
V }, where N (v) is the set of neighbors of v in G, rv =
S 0 is added.
dµ · deg(v)e for each v ∈ V , and P = dp|V |e, then we have
the PPIS problem. A. The Algorithm for PSMC
In this paper, we assume that the maximum covering Our algorithm for PSMC consists of two stages. In the first
requirement rmax = max{re : e ∈ E} has a constant upper stage, the algorithm implements an α-approximation algorithm
bound. for MDSC to greedily choose sub-collections until at least P
B. Minimum Density Sub-Collection Problem elements are fully covered. In the second stage, the last sub-
For simplicity of statement, we shall use C(S 0 ) to denote collection is further pruned by a greedy strategy.
the set of elements fully covered by sub-collection S 0 . As we To control the cost, a “guessing” technique is implemented.
have stated in Subsection I-B, our algorithm will choose a most That is, we guess at most rmax heaviest sets in an optimal
efficient sub-collection in each iteration, where the efficiency solution. If an optimal solution contains at most rmax sets, then
of a sub-collection S 0 is measured by its density defined as an optimal solution is obtained. Otherwise, for each guess,
remove those heavier sets to obtain a reduced instance, and
den(S 0 ) = c(S 0 )/|C(S 0 )|. solve the reduced instance. The algorithm chooses the best
solution over all guesses. Under the assumption that rmax
Definition II.3 (Minimum Density Sub-Collection (MDSC)).
has a constant upper bound, guessing can be accomplished
Given E, S, c, r, the MDSC problem is to find a sub-collection
in polynomial time.
with the minimum density.
For each guessed sub-collection G containing rmax sets, the
Unfortunately, MDSC is also NP-hard. collection of sets S 0 used in the reduced instance is obtained
from S by removing those sets in G as well as removing
Theorem II.4. The MDSC problem is NP-hard.
all those sets heavier than minS∈G c(S). The input for the
Proof. We reduce the perfect 3-dimensional matching problem reduced instance (with respect to G) is (E 0 , S 0 , c, r0 , P 0 ),
to MDSC. Given an integer k, three sets X, Y, Z each having where E 0 = E − C(G) is the set of elements waiting to
cardinality k, and a set T ⊆ X × Y × Z, the perfect be fully covered, the total remaining covering requirement
3-dimensional matching problem asks whether there is a P 0 = P − |C(S 0 )|, the remaining covering requirement for
subset T 0 ⊆ T with |T 0 | = k such that for any elements element e is re0 = max{0, re −|Ge |}, and those elements which
(x, y, z), (x0 , y 0 , z 0 ) ∈ T 0 , x 6= x0 , y 6= y 0 , and z 6= z 0 . have been fully covered by G have to be removed from each
Construct an instance of MDSC as follows. Let E = X ∪ set. In the following, when we mention a reduced instance
IEEE INFOCOM 2017 - IEEE Conference on Computer Communications

or when we say that the instance is updated, it is always Algorithm 2 S UB -PSMC(E 0 , S 0 , c, r0 , P 0 ): S UBROUTINE
understood that the above operations are executed. FOR PSMC
The main algorithm is presented in Algorithm 1, in which Input: A PSMC instance (E 0 , S 0 , c, r0 , P 0 )
the subroutine S UB -PSMC is recursively called to compute Output: A pair [tag, A], in which tag = 0 if the instance is
a solution on the reduced instances with respect to different infeasible and tag = 1 otherwise. In the latter case, A is a
guesses. sub-collection fully covering at least P 0 elements.
1: A ← ∅, tag ← 0.
Algorithm 1 M AIN A LGORITHM F OR PSMC 2: while P 0 > 0 do
Input: A PSMC instance (E, S, c, r, P ) 3: Use an α-approximation algorithm for MDSC to find
Output: A sub-collection F fully covering at least P ele- a sub-collection R.
ments. 4: if |C(R)| < P 0 then.
1: F ← S and c0 ← c(S). 5: A ← A ∪ R.
2: for each sub-collection G with |G| ≤ rmax do 6: Update the instance.
3: if |C(G)| ≥ P and c(G) < c0 then 7: else
4: F ← G and c0 ← c(G). 8: tag = 1.
5: end if 9: if |C(R)| = P 0 then
6: end for 10: A ← A ∪ R.
7: for each sub-collection G with |G| = rmax do 11: else
8: Construct the reduced instance (E 0 , S 0 , c, r0 , P 0 ) with 12: B ← S UB -P RUNE(E 0 , R, c, r0 , P 0 )
respect to G. 13: A←A∪B
9: [tag, A] ← S UB -PSMC(E 0 , S 0 , c, r0 , P 0 ). 14: end if
10: if tag = 1 then 15: P 0 ← 0.
11: if c(A) + c(G) < c0 then 16: end if
12: F ← A ∪ G and c0 ← c(A) + c(G). 17: end while
13: end if 18: Return [tag, A].
14: end if
15: end for Algorithm 3 S UB -P RUNE(E 0 , R, c, r0 , P 0 ): S UBROUTINE
16: Output F. FOR P RUNING
Input: An instance (E 0 , R, c, r0 , P 0 ) such that R fully covers
The output of subroutine S UB -PSMC is a pair [tag, A]. The at least P 0 elements.
first parameter tag indicates whether the reduced instance is Output: A sub-collection B ⊆ R which also fully covers at
infeasible. If feasible, the subroutine outputs a sub-collection least P 0 elements of E.
A which fully covers at least the required number P 0 of 1: B ← ∅.
elements. 2: while P 0 > 0 do
In each while loop of S UB -PSMC, after having found a sub- 3: Select a 0sub-collection R0 ⊆ R such that R0 =
collection R using an α-approximation algorithm for MDSC, c(R )
arg min{ |C(R 0
0 )| : |R | ≤ rmax }.
if the covering requirement P 0 is still not satisfied, then the 4: B ←B∪R. 0
algorithm takes R into the solution, updates the instance, and 5: Update the instance.
iterates. When the algorithm comes to the last iteration, in 6: end while
which a sub-collection R is found fulfilling the remaining total 7: Return B.
covering requirement, the algorithm prunes R into a smaller
sub-collection B by subroutine S UB -P RUNE, which still fulfills
the remaining total covering requirement, and takes B into the heaviest sets in an optimal solution, A0 is the sub-collection
solution. It should be remembered that when S UB -P RUNE is output by S UB -PSMC on the reduced instance with respect to
called, E 0 , r0 , P 0 may have been updated although the same G0 , and F0 = A0 ∪ G0 . Since the output F of Algorithm 1 is
notation is used. the best solution through all guesses, we have c(F) ≤ c(F0 ).
In S UB -P RUNE, the algorithm iteratively chooses a sub- So in the following, we only consider the call of S UB -PSMC
collection R0 ⊆ R containing at most rmax sets with the on the instance related with G0 .
minimum density. By our assumption that rmax has a constant Suppose line 3 of S UB -PSMC is executed t times, selecting
upper bound, the minimum density sub-collection of cardinal- sub-collections R1 , . . . , Rt , and then Rt is pruned into B by
ity at most rmax can be found in polynomial time. The update S UB -P RUNE (the case that Rt need not be further pruned is
of instances is similar to the one described previously. Pt−1
simpler and thus omitted here). We estimate costs i=1 c(Ri )
B. Performance Ratio and c(B) separately. In the following, OP T denotes an optimal
solution to PSMC, and opt = c(OP T ).
We only consider the case when an optimal solution con- Pt−1
tains more than rmax sets. Suppose G0 is the collection of rmax Lemma III.1. i=1 c(Ri ) ≤ αH(P )(opt − c(G0 )).
IEEE INFOCOM 2017 - IEEE Conference on Computer Communications

Proof. For i = 1, 2, . . . , t, denote by Pi the remaining cover- The desired upper bound for c(B) follows from (4) and
ing requirement after Ri is selected. Then |C(Ri )| = Pi−1 −Pi (5).
for i = 1, . . . , t − 1 where P0 = P 0 ≤ P is the input total
Since the output of Algorithm 1 is G0 ∪R1 ∪· · ·∪Rt−1 ∪B,
covering requirement of S UB -PSMC related with G0 . After
the performance ratio follows from Lemma III.1 and Lemma
the (i − 1)-th iteration, OP T − G0 is a sub-collection fulfilling
III.2.
the remaining covering requirement Pi−1 . So the density of
an optimal solution R∗i to the MDSC problem in the i-th Theorem III.3. Implementing an α-approximation algorithm
iteration is upper bounded by (opt − c(G0 ))/Pi−1 . Since Ri for MDSC, the PSMC problem admits an approximation within
approximates the density of R∗i within a factor of α, we have factor α(H(P ) + 1).
c(Ri ) opt − c(G0 ) IV. A PPROXIMATION FOR MDSC
≤α . (1)
Pi−1 − Pi Pi−1
This section presents an approximation algorithm for
So, MDSC. We first try to formulate the problem by a linear
t−1 t−1 programming using a language similar to “flow”.
X X Pi−1 − Pi
c(Ri ) ≤ α(opt − c(G0 ))
i=1 i=1
Pi−1 A. An LP-Formulation
≤ αH(P )(opt − c(G0 )). For an element e, an re -cover-set is a sub-collection S 0 with
0
|S | = re which fully covers e. Denote by Qe the family of
The lemma is proved. all re -cover-sets. Consider the following example.
Lemma III.2. c(B) ≤ α · opt − (α − 1)c(G0 )).
Example IV.1. E = {e1 , e2 , e3 }. S = {S1 , S2 , S3 } with S1 =
Proof. Let S UB -P RUNE -E XT be an extension of the algorithm {e1 , e2 , e3 }, S2 = {e1 } and S3 = {e1 , e3 }. r(e1 ) = 2 and
S UB -P RUNE on input sub-collection Rt in the following way: r(e2 ) = r(e3 ) = 1.
instead of halting when at least Pt−1 elements are fully
For this example, Qe1 = {{S1 , S2 }, {S1 , S3 }, {S2 , S3 }},
covered, it continues until all elements in C(Rt ) are fully
Qe2 = {S1 } and Qe3 = {{S1 }, {S3 }}. It should be empha-
covered (notice that the input P 0 of S UB -P RUNE equals Pt−1 ,
sized that a same cover-set belonging to different Qe ’s will be
the number of elements effectively fully covered in the last
viewed as different cover-sets. For example, {S1 } belongs to
iteration of S UB -PSMC). Suppose the sub-collections selected (e )
both Qe2 and Qe3 . To distinguish them, we shall use Qj i to
by S UB -P RUNE -E XT are R01 , . . . , R0l sequentially, and q is the
denote cover-sets in Qei . For example, Qe1 contains three re1 -
first index when at least Pt−1 elements are fully covers, that (e ) (e ) (e )
is, cover-sets Q1 1 = {S1 , S2 }, Q2 1 = {S1 , S3 } and Q3 1 =
(e2 )
|C(R01 )| + |C(R02 )| + · · · + |C(R0q−1 )| < Pt−1 (2) {S2 , S3 }, Qe2 contains one re2 -cove-set Q1 = {S1 }, Qe3
(e ) (e )
contains two re3 -cover-sets Q1 3 = {S1 } and Q2 3 = {S3 }.
and |C(R01 )| + |C(R02 )| + · · · + |C(R0q )| ≥ Pt−1 . The following is an integer program for bounded MDSC:
By the greedy strategy, we have P
S∈S cS xS
c(R01 ) c(R02 ) c(R0l ) min P (6)
0 ≤ 0 ≤ ··· ≤ . e∈E ye
|C(R1 )| |C(R2 )| |C(R0l )| X
s.t. lQ ≥ ye for e ∈ E
Hence Q:Q∈Qe
c(R01 ) + · · · + c(R0q−1 ) X
xS ≥ lQ for e ∈ E, S ∈ S
|C(R01 )| + · · · + |C(R0q−1 )| Q:S∈Q∈Qe
c(R01 ) + · · · + c(R0l ) c(Rt ) xS ∈ {0, 1}, for S ∈ S
≤ ≤ . (3)
|C(R01 )| + · · · + |C(R0l )| |C(Rt )| ye ∈ {0, 1}, for e ∈ E
Similarly to the deduction of inequality (1), we have lQ ∈ {0, 1}, for every Q ∈ Qe for some e ∈ E
c(Rt ) opt − c(G0 )
≤α . In fact, lQ indicates whether a cover-set Q is selected and xS
|C(Rt )| Pt−1 indicates whether set S is selected. The first constraint says
Combining this with (2) and (3), that if ye = 1 then at least one re -cover-set is selected and
q−1 thus e is fully covered. The family of selected sets is the union
X
c(R0i ) ≤ α(opt − c(G0 )). (4) of all those selected cover-sets. So, the second constraint says
i=1 that if S belong to some re -cover-set which is selected, then
S must be selected. The object function is exactly the density
Recall that every set used in S UB -PMSMC has cost at most
of selected sets.
minS∈G0 c(S), and each sub-collection R0i consists of at most
rmax sets. So, Consider Example IV.1 again. Setting lQ(e1 ) = lQ(e3 ) = 1
2 1
and all other l-values to be 0 implies that the selected sub-
c(R0q ) ≤ rmax min c(S) ≤ c(G0 ). (5) 0 (e1 ) (e3 )
collection S = Q2 ∪ Q1 = {S1 , S3 } and e1 , e3 are fully
S∈G0
IEEE INFOCOM 2017 - IEEE Conference on Computer Communications

covered. By the second constraint, xS1 = xS3 = 1 and we nodes with the minimum weight to satisfy those connectivity
may take xS2 = 0 (to minimize the objective function, it is requirements between terminal nodes.
better to take xS to be 0 if the right hand side of the second Our algorithm for bounded MDSC is presented in Algorithm
constraint is 0). By the first constraint, ye2 = 0 and we may 4. The NWSN instance used in line 4 of the algorithm is
take ye1 = ye2 = 1 (to minimize the objective function, it constructed in the following way. Let H be the graph on
is better to take ye to be 1 for all those elements e which node set Yi0 ∪ S ∪ {s} and edge set {eS : e ∈ Yi0 , S ∈
(e )
are fully covered). Notice that {S1 } serves as both Q1 2 and S, e ∈ S} ∪ {sS : S ∈ S}. Set the weight on every S ∈ S
(e3 )
Q1 , the l-value for the former is 0 and the l-value for the to be cS and the weight on all other nodes as zero. Set the
latter is 1, they are set independently. connectivity requirement rs,e = re for every e ∈ Yi0 and
The above integer program (6) can be relaxed to the the connectivity requirement on all other node pairs as zero.
following linear program LP1 : Denote the constructed instance as (H, c, r).

Algorithm 4 A LGORITHM F OR BOUNDED MDSC


X
min cS xS (7)
S∈S Input: An MDSC instance (E, S, c, r)
Output: A sub-collection S 0 .
X
s.t. ye = 1
e∈E 1: Find an optimal solution (xf , y f , lf ) to linear program
X (7).
lQ ≥ ye for e ∈ E
2: Let Yi = {e ∈ E : 2−(i+1) < yef ≤ 2−i } for 0 ≤ i ≤ I −1
Q:Q∈Qe
X and YI = {e ∈ E : yef ≤ 2−I }, where I = 2blog nc − 1.
xS ≥ lQ for e ∈ E, S ∈ S 3: Let i0 be an index such that |Yi0 | ≥ 2i0 /(I + 1).
Q:S∈Q∈Qe 4: Find an approximation solution H 0 to NWSN on instance
xS ≥ 0 for S ∈ S (H, c, r) constructed by the method in the above para-
ye ≥ 0 for e ∈ E graph.
5: Output S 0 = V (H 0 ) \ {Yi0 ∪ s}.
lQ ≥ 0 for Q ∈ Qe for some e ∈ E.

It should be noticed that under the assumption that rmax is The rational behind the algorithm will be manifested
upper bounded by a constant, the number of constraints in (7) through the analysis in the following subsection.
is polynomial, and thus (7) is polynomial-time solvable.
C. Theoretical Analysis
Lemma IV.2. The optimal value of the above linear program, Notice that any feasible solution to the NWSN problem on
denoted as optLP1 , satisfies optLP1 ≤ optM DSC , where instance (H, c, r) induces a feasible solution to the multi-cover
optM DSC is the optimal value for the integer linear program problem on instance (Yi0 , S, c, r). In fact, suppose element
(6). e ∈ Yi0 is connected to node s by rs,e = re edge-disjoint
paths which has the form of {sSi e}ri=1 e
, then {Si }ri=1
e
fully
Proof. Let (x∗ , y ∗ , Q∗ ) be an optimal solution to (6). Suppose
∗ ∗ ∗ ∗ ∗ ∗ covers element e. Taking the union of such sets will fully
, Q∗ /P ∗ ) is a feasi-
P
e∈E ye = P . Then (x /P , y /P P
∗ ∗ cover all elements in Yi0 .
ble solution toP(7). Hence optLP1 ≤ S∈S cS (xS /P ) =
P ∗ ∗ To analyze the correctness and the performance of the
S∈S cS xS / e∈E ye .
algorithm, we first give an LP-relaxation for the multi-cover
problem and an LP-relaxation for the NWSN problem.
B. The Algorithm
LP-relaxation for multi-cover. Similar to the construction
The algorithm for bounded MDSC makes use of an ap- of integer program (6), the multi-cover problem on instance
proximation algorithm for the minimum node weighted Steiner (Yi0 , S, c, r) can be formulated as an integer linear program
network problem. whose relaxation is as follows (notice that every element in
Yi0 is required to be fully covered):
Definition IV.3 (Node Weighted Steiner Network Problem X
(NWSN)). Given a graph G = (V, E) with a weight function min cS x S (8)
c on V and a connectivity requirement rs,t for each pair S∈S
of nodes (s, t), the minimum node weighted Steiner network
X
s.t. lQ ≥ 1 for e ∈ Yi0
problem asks for a subgraph H such that every pair of nodes Q:Q∈Qe
(s, t) are connected by at least rs,t edge-disjoint paths in H X
and the node weight of H is as small as possible. xS ≥ lQ for e ∈ Yi0 , S ∈ S
Q:S∈Q∈Qe
Notice that H must include all those nodes s with rs,t 6= 0 xS ≥ 0 for S ∈ S
for at least one node t. Such a node s can be viewed as a
lQ ≥ 0 for for Q ∈ Qe for some e ∈ E
terminal node. On the other hand, those nodes s with rs,t = 0
for any t 6= s need not be included in H. Such nodes are It should be remarked that an optimal solution automatically
Steiner nodes. The NWSN problem is to select a set of Steiner satisfies lQ ≤ 1. As a consequence, we also have xS ≤ 1. So,
IEEE INFOCOM 2017 - IEEE Conference on Computer Communications

(8) does not explicitly require xS and lQ to be no greater than Since i0 ≤ I − 1, we have yef ≥ 2−(i0 +1) for every e ∈ Yi0 .
1. Hence
LP-relaxation for NWSN. Next, consider the node- X
ˆlQ = 2(i0 +1)
X
weighted Steiner network problem. For each pair of nodes lQ f ≥ 2(i0 +1) yef ≥ 1
Q:Q∈Qe Q:Q∈Qe
s and t, an rs,t -path-set is a set of rs,t edge-disjoint (s, t)-
paths in G. Denote by Ps,t the family of all rs,t -path-sets and for every e ∈ Yi0 .
let P be the union of all these families. The following linear This implies that {x̂S , ˆlQ } is a feasible solution to LP2 . Hence
program LP3 is a relaxation for the NWSN problem which X X
was presented in [2]: optLP2 ≤ cS x̂S = 2i0 +1 cS xfS = 2i0 +1 opt1
X S∈S S∈S
min cv x v (9)
≤ 2i0 +1 optM DSC ,
v∈V
X
s.t. lP ≥ 1 for s, t ∈ V where the last inequality comes from Lemma IV.2.
P :P ∈Ps,t Claim 3. optLP 3 ≤ optLP2 , where optLP3 is the optimal
X value of LP3 (linear program (9)).
xv ≥ lP for v and s, t ∈ V
Suppose (x, l) is a feasible solution to LP2 . For each re -
P :v∈P ∈Ps,t
cover-set Q, let P (Q) = {sSe}S∈Q be the rs,e -path set
xv ≥ 0 for v ∈ V corresponding to Q. Setting lP (Q) = lQ will induces a feasible
lP ≥ 0 for P ∈ P solution to LP3 . The claim is proved.
In fact, for the corresponding integral formulation in which It was shown by Chekuri et.al in paper [2] that the integral-
lP and xv can only take values from {0, 1}, lP indicates ity gap for LP3 is
whether path-set P is chosen and xv indicates whether node v O(rmax log n), which is O(log n) under the assumption that
is chosen. The model in [2] uses equality instead of inequality rmax is a constant. Combining this with Claim 2 and Claim
in the first constraint, whose meaning is that for each pair of 3, the output S 0 of Algorithm 4 has cost
nodes s and t, exactly one rs,t -path-set is chosen, and thus c(S 0 ) ≤ 2i0 +1 O(log n)optM DSC .
the connectivity requirement between s and t is satisfied. The
second constraint says that if node v belongs to some chosen Since S 0 fully covers all elements in Yi0 , we have
path-set, then v must be chosen. Hence the chosen nodes c(S 0 ) c(S 0 ) 2i0 +1 O(log n)optM DSC
are those nodes on the union of chosen path-sets, and the ≤ ≤
|C(S 0 )| |Yi0 | 2i0 /(I + 1)
objective is to minimize the weight of those chosen nodes. 2
When relaxing variables by allowing fractional values, any = O((log n) )optM DSC .
optimal
P solution automatically has lP ≤ 1, xv ≤ 1, and The theorem is proved.
P :P ∈Ps,t lP = 1. Hence it does not matter if we relax the
first constraint to be inequality and do not explicitly require D. Application on PSMC and PPIS
xS and lP to be no greater than 1. Combining Theorem IV.4 with Theorem III.3, both the
Now, we are ready to analyze the performance ratio of constrained PSMC problem and the PPIS problem admit an ap-
Algorithm 4. proximation within factor O((log n)2 H(P )), where P = dpne
Theorem IV.4. For n ≥ 32, Algorithm 4 has performance for the PPIS problem.
ratio at most O((log n)2 ) for the bounded MDSC problem. V. C ONCLUSION AND D ISCUSSION
Proof. We prove the theorem step by step by first establishing In this paper, we studied the viral marketing with positive-
the following three claims. influence model and investigated a problem aiming to occupy
Claim 1. An index i0 as in Line 3 of Algorithm 4 exists a certain percentage of market, how much resource we should
and i0 ≤ I − P 1. invest? This was formulated as the PPIS problem.
f
In fact, by e∈E ye = 1, there exists an index i0 such that As a corollary of the above studies on PSMC, the PPIS prob-
f f −i0
P
e∈Yi0 ye ≥ 1/(I + 1). Since ye ≤ 2 for every e ∈ Yi0 , lem admits an approximation within factor O((log n)2 H(pn),
i0
we have |Yi0 | ≥ 2 /(I + 1). as long as the maximum degree of the social network is upper
Notice that e∈YI yef ≤ n2−I < 1/(I + 1) for n ≥ 32.
P
bounded by a constant. Improvement on this ratio will depend
Hence the above i0 ≤ I − 1. on improved algorithms for the minimum density subcollection
Claim 2. optLP2 ≤ 2i0 +1 optM DSC , where optLP2 is the problem.
optimal value of LP2 (linear program (8)). Our studies show that PPIS is a very challenging problem.
Let x̂S = 2i0 +1 xfS for each set S and let ˆlQ = 2i0 +1 lQ f One reason is that it possesses a “leapfrogging” property.
for each cover-set Q. For any element e ∈ Yi0 and any set As an illustration, suppose a user is activated only when
S ∈ S, at least 10 of his friends posses the product. Even if he
has heard about the product from 9 of his friends, he is
X X
x̂S = 2i0 +1 xf ≥ 2i0 +1
S lQ f = ˆlQ .
Q:S∈Q∈Qe Q:S∈Q∈Qe
still not activated. One more message from a friend pushes
IEEE INFOCOM 2017 - IEEE Conference on Computer Communications

him across the threshold, while the effect of 9 messages is [22] H. Zhang, D. Thang, and M.T. Thai, Maximizing the spread of positive
equivalent to nothing. So, although the company has strived influence in online social networks. IEEE 33rd International Conference
on Distributed Computing Systems (ICDCS’2013), 2013.
to provide a large amount of samples, it is still possible that
only very few users are activated. Since what matters is only
the number of activated users, a lot of efforts might have been
wasted on fruitless influence on those inactive users. So, in
order to obtain a good approximation, one has to control the
wasted. Such a “leapfrogging” phenomenon is interesting and
appear frequently in the real world. New ideas are needed and
conquering such a problem will have a great theoretical value.

ACKNOWLEDGMENT
This research is supported by NSFC (61222201,11531011).

R EFERENCES
[1] R. Bar-Yuhuda, Using Homogeneous Weights for Approximating the
Partial Cover Problem. Journal of Algorithms, 39 (2001) 137–144.
[2] C. Chekuri, A. Ene, A. Vakilian, Prize-collecting Survivable Network
Design in Node-weighted Graphs. APPROX/RANDOM LNCS 7408
(2012) 98–109.
[3] N. Chen, On the approximability of influence in social networks, SIAM
J. Discrete Math., 23(3) 1400–1415. A preliminary version appears in
SODA’08, 1029–1037.
[4] V. Chvatal, A greedy heuristic for the set-covering problem, Math. Oper.
Res. 4 (1979) 233–235.
[5] P. David, B. David, The Design of Approximation Algorithms. Cambridge
University Press, 2010.
[6] T.N. Dinh Y. Shen, D.T. Nguyen, M.T. Thai, On the approximability
of positive influence dominating set in social networks. Journal of
Combinatorial Optimization 27 (2014) 487–503.
[7] G. Dobson, Worst-case analysis of greedy heuristics for integer program-
ming with nonnegatice data. Math. Oper. Res. 7 (1982) 515–531.
[8] P. Domingos, M. Richardson, Mining the network value of customers. In
Proceedings of the seventh ACM SIGKDD international conference on
Knowledge discovery and data mining (2001) 57–66.
[9] U. Feige, A threshold of ln n for approximating set cover, in Proc. 28th
ACM Symposium on the Theory of Computing, pp. 312–318, 1996.
[10] R. Gandhi, S. Khuller, A. Srinivasan, Approximation algorithms for
partial covering problems. Journal of Algorithms, 53(1) (2004) 55–84.
[11] D.S. Johnson, Approximation algorithms for combinatorial problems, J.
Comput. System Sci., 9 (1974) 256–278.
[12] R.M. Karp, Reducibility among combinatorial problems , in Complexity
of Computer Computations, R. E. Miller and J. W. Thatcher, eds., Plenum
Press, New York, pp. 85–103, 1972.
[13] M. Kearns, The Computational Complexity of Machine Learning. MIT
Press, Cambridge, MA, 1990.
[14] D. Kempe, J. Kleinberg, and E. Tardos, Maximizing the spread of
influence through a social network. In Proceedings of the ninth ACM
SIGKDD international conference on Knowledge discovery and data
mining (2003) 137–146.
[15] J. Konemann, O. Parekh, D. Segev, A Uinifed Approach to Approxima-
tion Partial Covering Problems. Algorithmica, 59 (2011) 489–509.
[16] L. Lovász, On the ratio of optimal integral and fractional covers, Discrete
Math., 13 (1975) 383–390.
[17] Y. Ran, Z. Zhang, H. Du, Y. Zhu, Approximation algorithm for partial
positive influence problem in social network. Journal of Combinatorial
Optimization, DOI 10.1007/s10878-016-0005-0: 1–12, 2016.
[18] Y. Ran, Y. Shi, Z. Zhang, Local ratio method on partial set multi-cover,
Journal of Combinatorial Optimization, DOI 10.1007/s10878-016-0066-
0, 2016.
[19] V. Setty, G. Kreitz, G. Urdaneta, R. Vitenberg, M. van Steen, Maximiz-
ing the number of satisfied subscribers in pub/sub systems under capacity
constraints. INFOCOM 2014, 2580–2588.
[20] P. Slavı́k, Improved performance of the greedy algorithm for partial
cover. Information Processing Letters, 64(5): 251–254.
[21] F. Wang, E. Camacho, and K. Xu, Positive influence dominating set
in online social networks. In International Conference on Combinatorial
Optimization and Applications (2009) 313–321.

You might also like