Index Based Top K Uncertain Graphs

The Journal of Supercomputing (2022) 78:19372–19400
https://doi.org/10.1007/s11227-022-04613-1
Index‑based top k α‑maximal‑clique enumeration

over uncertain graphs
Jing Bai1,2 · Junfeng Zhou1 · Ming Du1 · Ziyang Chen3
Accepted: 15 May 2022 / Published online: 20 June 2022

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature
2022
Abstract
Uncertain graphs are widespread in the real world. Enumerating maximal cliques over
uncertain graphs is a fundamental problem in many applications. This paper studies
the problem of enumerating the top k maximal cliques with the most number of ver-
tices satisfying that their probabilities ≥ 𝛼, where each maximal clique is called an 𝛼
-maximal-clique. Existing works are inefficient due to suffering from expensive com-
putations on the cliques carrying little information. To address this problem, we pro-
pose an index-based top k 𝛼-maximal-clique enumeration algorithm which computes 𝛼
-maximal-cliques based on an index to improve the efficiency. We propose two efficient
indexes, called degree-based index and core-based index, which sort the vertices in
descending order according to their degrees or core numbers to help prune unpromising
vertices, such that to avoid enumerating redundant 𝛼-maximal-cliques. We propose a
support-based pruning strategy to further avoid enumerating the not chosen 𝛼-maximal-
cliques to speed up the enumeration. The experimental results on 20 real-world datasets
show that our algorithms can return the top k 𝛼-maximal-cliques efficiently.
Ming Du and Ziyang Chen have equally contributed to this work.
* Jing Bai
18016297367@163.com
* Junfeng Zhou
zhoujf@dhu.edu.cn
Ming Du
duming@dhu.edu.cn
Ziyang Chen
zychen@ysu.edu.cn
1
School of Computer Science and Technology, Donghua University, 2999 Renmin North Road,
Songjiang District, Shanghai 201620, China
2
College of Information Technology, Shanghai Jian Qiao University, 1111 Hucheng Ring Road,
Pudong New Area, Shanghai 201306, China
3
Shanghai Lixin University of Accounting and Finance, 2800 Wenxiang Road, Songjiang
District, Shanghai 201620, China
1Vol:.(1234567890)
3
Index‑based top k α‑maximal‑clique enumeration over… 19373
Keywords Uncertain graph · Maximal clique · Top k maximal cliques
1 Introduction
A graph can represent the complex relationships between different entities. Extract-
ing dense regions from a graph can help to identify important information. A maxi-
mal clique (MC) is one of the densest subgraphs satisfying that no other cliques can
take it as their proper subgraph [1–4]. MC enumeration is a fundamental problem in
many applications [5–8], such as finding overlapping communities from social net-
works [9–15], recommending products for sales [16, 17], detecting ratings [18, 19], and
discovering followers on social networks. The classic MC enumeration algorithm over
deterministic graphs is the Bron–Kerbosch algorithm proposed by Bron and Kerbosch
[20] in 1973, which employs depth-first search. In 2006, Tomita et al. [21] proposed
an improved algorithm BKPivoit through a better strategy for pivot selection based
on Bron–Kerbosch. In 2011, Eppstein et al. [22] further modified Bron–Kerbosch by
choosing more carefully the processing order of vertices.
In many real-world applications, the relationships between vertices always carry
uncertainties due to noise [23], measurement errors [24], the accuracy of predictions
[25, 26] and privacy concerns [27], such as social networks with inferred influence
[28, 29] and sensor networks with uncertain connectivity links [30]. In these appli-
cations, we often model the uncertain relationships using an uncertain graph, where
the existence of each edge is denoted by a probability [31]. MC enumeration over
uncertain graphs is used to find from the underlying graph all MCs satisfying that
each one has probability ≥ 𝛼 , where 𝛼 is a threshold specified by users, and each MC
is called an 𝛼-MC. A specific application of MC enumeration over uncertain graphs
is protein–protein interaction (PPI) networks with experimentally inferred links
[32–34]. The vertex represents the protein molecule, the edge represents the interac-
tion relationship between the protein molecules, and the edge probability represents
the credibility of the interaction [35]. Because of the interference of many factors in
protein experiments, proteins with no direct interaction may be detected by mistake,
resulting in false positives. MCs in the PPI network often correspond to protein
complexes [36], that is, multiple proteins are combined by interaction for a specific
function. Therefore, MC enumeration over uncertain graphs is of great significance
for protein complex recognition and prediction of unknown protein complexes [37].
The most classical MC enumeration algorithm over uncertain graphs was MULE
proposed by Pan et al. [5] in 2015, which employs a heuristic method by processing
vertices according to their IDs. In 2019, Ahmar Rashid et al. [8] proposed an algo-
rithm EMCTDS to enumerate all the 𝛼-MCs, which utilizes a novel index named
h-index to maintain vertices with degrees greater than h. In 2019, Rong-Hua Li et al.
[7] proposed two core-based pruning algorithms to reduce the graph size to improve
the efficiency. However, 𝛼-MC enumeration is time consuming � on large
� graphs since
n
the maximum number of 𝛼-MCs in an uncertain graph is , where n is the
⌊n∕2⌋
13
19374 J. Bai et al.
(a) (b)
(c) (d)
Fig. 1 Examples for uncertain graphs. a G = (V, E, 𝛽) is the original uncertain graph. b G� = (V � , E� , 𝛽 � )
is the degree-based index. c G�� = (V �� , E�� , 𝛽 �� ) is the core-based index. d G�� = (V �� , E�� , 𝛽 �� ) is the
index with support-based pruning
number of vertices in the graph [5]. Among them, small 𝛼-MCs carry little informa-
tion and are not considered meaningful. Thus, in practice, we do not need to find all
the 𝛼-MCs but the larger ones.
In this paper, we study the problem of the enumerating largest k (top k) 𝛼-MCs. Most
existing works focus on returning top k MCs over deterministic graphs [38–40]. How-
ever, to the best of our knowledge, only [6, 41], and [42] have tried to study the problem
of enumerating top k MCs over uncertain graphs. [6] and [41] proposed to return k MCs,
satisfying that they have the largest probabilities and the number of vertices is no less
than a given threshold. When ranking MCs according to their probabilities, they usu-
ally return MCs with higher probabilities but fewer vertices and fail to return large MCs
that convey more useful information. According to [6] and [41], for the uncertain graph
in Fig. 1a, when enumerating the top 1 MC and limiting the number of vertices ≥ 3,
the result is {v3 , v6 , v7 }. However, users may be more interested in MC {v8 , v9 , v10 , v11 }
which is larger and more likely to be meaningful. [42] proposed to return the largest k
MCs satisfying that their probabilities ≥ 𝛼, which is called top k 𝛼-MCs. According to
[42], limiting 𝛼 = 0.3, the result of top 1 MC with respect to Fig. 1a is {v8 , v9 , v10 , v11 }.
For the result enumeration, [42] first computes all the MCs regardless of the uncertain-
ties, then computes 𝛼-MCs over them. In practice, [42] suffers from expensive compu-
tations on redundant MCs.
Challenges. For top k 𝛼-MC enumeration over uncertain graphs, whether an MC
is an 𝛼-MC heavily depends on the input parameter 𝛼 , which is a value that the users
usually cannot specify accurately. If 𝛼 is small, then users may be overwhelmed by
too many results; if 𝛼 is too large, otherwise, users may not get satisfactory results.
13
Therefore, users may need to run the algorithm multiple times to carefully tune the
parameters.
However, running the algorithm multiple times for top k 𝛼-MC enumeration is
time consuming on large graphs. For each execution, we must compute all the 𝛼-
MCs to find the largest k ones. In practice, the parameter k is far less than the total
number of 𝛼-MCs, which means that we may enumerate numerous 𝛼-MCs, where
most of them are not returned. Such redundant computations will decrease the over-
all performance.
Our approach. Motivated by the above observations, we propose an index-based
algorithm, named IKC, to reduce the cost of top k 𝛼-MC enumeration. We only need
to construct the index once. After that, no matter how 𝛼 varies we can avoid enumer-
ating numerous 𝛼-MCs that are not chosen to reduce the time consumption.
We propose a degree-based index, which sorts vertices in descending order
according to their degrees. This index allows vertices with large degrees to be pro-
cessed first, ensuring that larger 𝛼-MCs are produced as early as possible, based on
the fact that the vertices with larger degrees are more likely to be contained in larger
𝛼-MCs. Such an index helps to prune the unpromising vertices that are definitely not
contained in any larger 𝛼-MC to early terminate redundant enumerations. Further-
more, we propose a core-based index, which first computes the core numbers [43]
of vertices, then sorts vertices in descending order according to their core numbers.
Core number is a more effective metric to constrain the tightness between vertices of
a clique. With this index, vertices with larger core numbers are processed before oth-
ers, such that larger 𝛼-MCs can be produced as early as possible.
Although the degree-based index and the core-based index are efficient, we find that
some small 𝛼-MCs may still be produced first. This is because when all the vertices of
a small 𝛼-MC have large degrees or large core numbers, we cannot prune them by any
vertex property. For example, considering Fig. 1b, v1’s degree is 7 and v2’s degree is 4,
which means the small 𝛼-MC {v1 , v2 } will be produced earlier than other larger 𝛼-MCs.
To address this problem, we further propose a support-based pruning strategy based
on edge support [44]. The support value of an edge is equal to the number of com-
mon neighbors of the two vertices of the edge. Accordingly, assuming that the support
value of an edge is 1, then only the MCs of size 3 contain this edge. Such small MCs
should not be produced first. Based on this observation, before index construction, we
first compute the support value for each edge and delete the edges with support values
no larger than 1. Such preprocessing can help avoid enumerating the small 𝛼-MCs in
which the vertices have large degrees or core numbers. Moreover, such preprocessing
reduces the size of the input graph to further reduce the cost of the enumeration. Our
contributions are as follows:
• We propose an index-based top k 𝛼-MC enumeration algorithm, namely IKC,

which computes 𝛼-MCs based on an index to improve the efficiency.
• We propose two efficient indexes, called degree-based index and core-based index,
which sort vertices in descending order according to their degrees or core numbers
to prune unpromising vertices to improve the efficiency.
13
19376 J. Bai et al.
Table 1 The summary of notations

Notation Definition
G An uncertain graph
V The vertex set of G
E The edge set of G
𝛽 The function that maps each edge in E to a probability value in [0,1]
𝛼 The threshold specified by users
k The number of results specified by users
u, v, w A vertex in G
(u, v) An edge between vertices u and v in G
d(u), d(u, G� ) The degree of vertex u in G and subgraph G′, respectively
core(u) The core number of vertex u
sup(u,v) The support value of edge (u, v)
r(u) The rank value of vertex u
N(u) The set of neighbors of vertex u in G
dmax The maximum degree
max(C) The vertex in clique C that has the largest ID
|C| The number of vertices in clique C
△uvw The triangle consisting of three vertices u, v and w
• We further propose an optimization, which deletes the edges with support values
no larger than 1 to reduce the size of the input graph to speed up the overall perfor-
mance.
• We conduct experiments on 20 uncertain datasets. The experimental results show
that our algorithms can return top k 𝛼-MCs efficiently.
Organization. The rest of this paper is summarized as follows. Section 2 provides

some preliminaries. In Sect. 3, we give a review of existing works. In Sect. 4, we
develop a baseline approach. In Sect. 5, we propose an index-based algorithm IKC and
two indexes, which are called degree-based index and core-based index, respectively. In
Sect. 6, we further propose a support-based pruning. Section 7 shows the experimental
results. Section 8 concludes this paper.
2 Preliminaries
In this section, we formally introduce the notations and definitions. Mathematical nota-
tions used throughout this paper are summarized in Table 1.
Definition 1 (Uncertain graph) An uncertain graph G = (V, E, 𝛽) is an undirected

graph, where V represents the set of vertices, E the set of edges, and 𝛽 a function that
maps each edge in E to a probability value in [0, 1].
13
For example, Fig. 1a is an uncertain graph. Edge (v1 , v2 ) has a probability of 0.8.
Definition 2 (Clique) Given a graph G = (V, E), the vertex set C ⊆ V is a clique, if
C ≠ ∅, and there is an edge between each pair of vertices in C.
For example, there are many cliques in Fig. 1a, such as {v8 , v9 , v10 } and
{v8 , v9 , v10 , v11 }.
For simplicity, we also say the edges connecting vertices of C as C’s edges. Obvi-
ously, given a clique C of n vertices, C has n(n − 1)∕2 edges.
Definition 3 (Maximal Clique (MC)) Given a graph G = (V, E) and clique C ⊆ V , we say
C is a maximal clique (MC), if there is no vertex v ∈ V ⧵ C such that C ∪ {v} is a clique.
For example, in Fig. 1a, {v8 , v9 , v10 } is a clique but not an MC, due to the exist-
ence of the vertex v11 such that {v8 , v9 , v10 } ∪ {v11 } is a clique, and {v8 , v9 , v10 , v11 }
is an MC as there is no vertex v ∈ V ⧵ C such that {v8 , v9 , v10 , v11 } ∪ {v} is a clique.
Definition 4 (Clique size) Given a clique C, the size of C is the number of vertices in C,
denoted as |C|.
For example, C1 = {v8 , v9 , v10 , v11 } is larger than C2 = {v8 , v9 , v10 }, since |C1 |
=4 > |C2 | = 3.
Definition 5 (Clique probability) Given an uncertain graph G = (V, E, 𝛽 ), for any

clique C in G, the clique probability clq(C, G) is the product of the probabilities of all
edges in C.
For example, in Fig. 1a, let C = {v1 , v2 , v3 }, then clq(C, G) = 0.576.
Definition 6 (𝛼-clique) Given an uncertain graph G = (V, E, 𝛽) and a probability

threshold 𝛼, C is an 𝛼-clique, if it satisfies (1) C is a clique; and (2) clq(C, G) ≥ 𝛼.
For example, in Fig. 1a, if 𝛼 = 0.5, then C1 = {v1 , v2 , v3 } is an 𝛼-clique because

clq(C1 , G) = 0.576 > 0.5, but C3 = {v8 , v9 , v10 , v11 } is not an 𝛼-clique because
clq(C3 , G) = 0.42 < 0.5.
Definition 7 (𝛼-MC) Given an 𝛼-clique C, we say C is an 𝛼-MC, if there is no v ∈ V ⧵ C

such that C ∪ {v} is an 𝛼-clique.
For example, in Fig. 1a, if 𝛼 = 0.5, then C1 = {v1 , v2 , v3 } is an 𝛼-MC, but

C2 = {v1 , v2 } is not an 𝛼-MC because C2 ∪ {v3 } is an 𝛼-clique.
13
19378 J. Bai et al.
Definition 8 (k-core [43]) Given a graph G = (V, E) and an integer k, a k-core is the
maximal connected induced subgraph G� [V � ] in which each vertex has a degree d(u, G� )
of at least k, i.e., for all u ∈ V �, we have d(u, G� ) ≥ k.
For example, in Fig. 1a, given k = 2, the 2-core is G� [V � ] = {v1 , v2 , v3 , v6 , v7 , v8 , v9 , v10 , v11 },
since each vertex in G� [V � ] has a degree of at least 2.
Definition 9 (Core number [43]) Given a graph G = (V, E), the core number for a ver-
tex u, denoted as core(u), is the largest integer of k such that the k-core contains u.
For example, in Fig. 1a, given k = 2, we have core(v1 ) = 2 and core(v8 ) = 3, since
there is a largest k = 2 such that the 2-core G� [V � ] = {v1 , v2 , v3 , v6 , v7 , v8 , v9 , v10 , v11 }
contains v1, and there is a largest k = 3 such that the 3-core G�� [V �� ] = {v8 , v9 , v10 , v11 }
contains v8.
Definition 10 (Support value [44]) Given a graph G = (V, E), the support value of an
edge (u, v) ∈ E , denoted by sup(u,v), is defined as |{△uvw |w ∈ V} ∣. The support value
of an edge (u, v) is the number of △uvw that contain (u, v) and can be computed by
|N(u) ∩ N(v)|.
For example, in Fig. 1a, sup(v3 ,v8 ) = 0, sup(v2 ,v3 ) = 1, sup(v8 ,v9 ) = 2.
Problem Statement. Given an uncertain graph G = (V, E, 𝛽), a probability
threshold 𝛼 and an integer k, return the top k 𝛼-MCs.
3 Related works
Top k MC enumeration on uncertain graphs. The concept of probability top k

MCs on uncertain graphs is first proposed by Zou et al. [6] in 2010. Zhu Rong et al.
[41] further studied how to enumerate top k MCs on large-scale uncertain graphs.
Both [6] and [41] rank MCs according to their probabilities and return k MCs with
the largest probabilities, and their sizes are not less than a given threshold. As a
result, they may return the MCs with higher probabilities but fewer vertices and fail
to return large MCs that convey more useful information. In 2021, Bai Jing et al.
[42] proposed to return the k largest MCs satisfying that their probabilities ≥ 𝛼 to
find the most useful MCs. It first computes all the MCs regardless of the uncertain-
ties and computes 𝛼-MCs over them to reduce the computations of common neigh-
bors. However, the redundant computation of numerous MCs and unpromising 𝛼-
MCs decrease the overall performance.
13
Top k MC enumeration on deterministic graphs. In 2015, Yuan et al. [38]

proposed EnumK, which processes vertices based on depth-first search and main-
tains a result set of k. In 2018, Apurba et al. [39] proposed an algorithm Topk-
MaximalQC, which prunes the vertices with small core numbers to reduce redun-
dant computations. In 2020, Wu et al. [40] proposed an algorithm TOPKLS, which
maintains a result set of k that can be dynamically adjusted to facilitate pruning
small cliques.
This paper studies the top k MCs on uncertain graphs by returning the top k MCs
with probabilities ≥ 𝛼. Our approach is more efficient than existing algorithms by prun-
ing unpromising vertices to avoid redundant enumerations.
4 Baseline approach
Since it is difficult to find the top k 𝛼-MCs before knowing all the 𝛼-MCs, the baseline
approach, i.e., Algorithm 1, first computes all the 𝛼-MCs and then returns the largest k
ones.
Specifically, Algorithm 1 processes vertices in ascending order according to their
IDs, where it starts from the first vertex v1 to find all the 𝛼-MCs containing v1. It
maintains three sets C, A, and X. C is used to store the current 𝛼-MC. A is used to
store the candidate vertices that are about to be added to C. Each vertex in A should
be connected to all the vertices in current C. When expanding C by adding a vertex
from A, it maintains the clique probability of C no less than the threshold 𝛼 . Add-
ing a new vertex u to C will decrease the clique probability, since the new clique
probability of C is equal to the product of the current clique probability and the
probabilities of the edges between u and each vertex in C. Thus, when A updates by
Function GenerateA with C expanding, each tuple (u, r) stored in A satisfies that
u is connected to all the vertices in C, u >max(C), and clq(C ∪ {u}, G) ≥ 𝛼 , where
r denotes the probability that the current clq(C, G) will multiply when vertex u is
added to C. To ensure each 𝛼-MC is non-repeating, X maintains the vertices that are
connected to all the vertices in C but has been processed before. X updates by Func-
tion GenerateX with C expanding. In Function GenerateX, each tuple (u, r) stored
in X needs to satisfy that u is connected to all the vertices in C, u <max(C) and
clq(C ∪ {u}, G) ≥ 𝛼 . The expansion will terminate when both A and X are empty
(line 7), which means that the current C is an 𝛼-MC. It maintains a result set R of k
storing the current largest k 𝛼-MCs according to their sizes.
13
19380 J. Bai et al.
Example 1 Considering Fig. 1a, given 𝛼 = 0.3 and k = 2, Algorithm 1 enumerates 𝛼

-MCs starting from the first vertex v1. With the three sets C, A, and X updating, the
running status of C, A, and X are shown in Table 2, where for each tuple (u, r) we omit
13
Table 2 The running status of C A X

C, A, and X
∅ v1 , v2 , v3 , v4 , v5 , v6 , v7 , v8 , v9 , v10 , v11 ∅
v1 v2 , v 3 ∅
v1 , v 2 v3 ∅
v1 , v 2 , v 3 ∅ ∅
v1 v3 ∅
v1 , v 3 ∅ v2
v2 v3 ∅
v2 , v 3 ∅ v1
v3 v4 , v 5 , v 6 , v 7 , v 8 v1 , v 2
v3 , v 4 ∅ ∅
v3 v5 , v 6 , v 7 , v 8 v1 , v 2
v3 , v 5 ∅ ∅
v3 v6 , v 7 , v 8 v1 , v 2
v3 , v 6 v7 ∅
v3 , v 6 , v 7 ∅ ∅
v3 v7 , v 8 v1 , v 2
v3 , v 7 ∅ v6
v3 v8 v1 , v 2
v3 , v 8 ∅ ∅
. . .
the probability r due to space limit. When both A and X are empty, the current C is an
𝛼-MC. When processing vertex v1, the first 𝛼-MC {v1 , v2 , v3 } is found and added into
result set R. Next, after processing v2, no valid 𝛼-MCs are found. When processing v3,
the first 𝛼-MC {v3 , v4 } is added into R. The next 𝛼-MC {v3 , v5 } is dropped, since there
are already 2 𝛼-MCs in R and its size is no larger than the current smallest one {v3 , v4 }.
When the next 𝛼-MC {v3 , v6 , v7 } is found, it will replace {v3 , v4 } after comparing sizes.
Then, the next 𝛼-MC {v3 , v8 } is dropped, since its size is no larger than the current
smallest one {v1 , v2 , v3 }. The processes of other vertices are similar. Note that, when the
last 𝛼-MC {v8 , v9 , v10 , v11 } is found, it will replace {v1 , v2 , v3 }. Finally, we get the result
set R = {{v3 , v6 , v7 }, {v8 , v9 , v10 , v11 }}.
Analysis. From Algorithm 1, we know that the running time of Function Enum_
Clique dominates the whole performance. An execution of the recursion of Func-
tion Enum_Clique can be viewed as a search tree. Each call to Function Enum_Clique
is a node of this search tree. The first call is the root node. A node in this search tree
is either an internal node that makes one or more recursive calls or a leaf node that
does not make further recursive calls. Specifically, the running time at each leaf node is
O(1). This is because there are no further recursive calls, that is, set A is empty. Check-
ing the size of A takes constant time. The time taken at each internal node is O(|V|).
Line 22 takes O(|V|) time as all vertices in C are added into C′. Line 23 takes constant
time. Line 24 take O(|V|) time. For each vertex, the total number for calling Function
13
19382 J. Bai et al.
Table 3 The number of 𝛼-MCs in real-world datasets, when 𝛼 = 0.3

Dataset # of 𝛼-MCs clique size:#
email-Eu-core 10,957 4:95 3:5,367 2:5,495

amaze 2978 3:32 2:2,946
arxiv 62,213 5:1 4:1,437 3:38,466 2:22,309
yago 31,662 3:6,102 2:25,560
pubmed 29,135 4:26 3:3,037 2:26,072
mtbrv 8,627 3:110 2:8,517
citeseer 32,634 4:38 3:6,340 2:26,256
anthra 10,992 3:50 2:10,942
ecoo 11,283 3:76 2:11,207
agrocyc 11,265 3:78 2:11,187
Email-Enron 176,997 8:3 7:27 6:913 5:889 4:20,945 3:90,454 2:63,766
Slashdot0811 294,905 5:2 4:566 3:29,081 2:265256
Email-EuAll 176,514 3:1,676 2:174,838
web-Google 381,589 4:144 3:26,653 2:354,792
10cit-Patent 1,294,268 4:26 3:16,749 2:1,277,498
05citeseerx 2,263,966 4:827 3:129,132 2:2,134,007
05cit-Patent 2,254,780 4:160 3:71,096 2:2,463,524
WikiTalk 1,842,303 3:5,945 2:1,836,358
dbpedia 5,707,522 5:27 4:13,459 3:962,651 2:4,731,385
cit-Patents 12,084,657 5:26 4:20,213 3:1,293,219 2:10,771,199
Enum_Clique is no more than O(2|V| ). In conclusion, the time complexity of Algo-

rithm 1 is O(|V| ⋅ 2|V| ) [5].
Correctness of Algorithm 1. Algorithm 1 enumerates all the 𝛼-MCs and compares
their sizes to find the largest k ones. Thus, Algorithm 1 does not miss any correct result.
5 Index‑based approach
Algorithm 1 computes all the 𝛼-MCs to find the largest k ones. In practice, there are
numerous small 𝛼-MCs in real-world datasets as shown in Table 3, where most of them
are not chosen. Moreover, users may run Algorithm 1 multiple times to find satisfactory
results since the results heavily depend on the parameter 𝛼. The redundant computation
of the 𝛼-MCs that are not chosen will decrease the overall performance.
Algorithm 1 processes vertices in ascending order according to their IDs. How-
ever, vertices with preferential IDs may not be contained in any large 𝛼-MC, such
as v4 and v5 in Example 1. Early processing of such vertices results in small 𝛼-MCs
being produced early. We must continue enumerating because we do not know if
the 𝛼-MCs produced later are larger. However, if large 𝛼-MCs can be produced ear-
lier, we can employ the pruning strategy to avoid processing the unpromising verti-
ces that are not contained in any lager 𝛼-MC. Pruning such vertices means avoiding
13
numerous enumerations of 𝛼-MCs that are not chosen, thus reducing the redundant
computation. Therefore, we conclude that processing order of vertices dominates the
efficiency of enumeration.
Based on the above conclusion, we propose an index-based algorithm IKC, i.e.,
Algorithm 2. Before calling Algorithm 2, it first constructs an index, which contains
a vertex set V ′, an edge set E′, a function 𝛽 ′ mapping each edge in E′ to a probability
value, and a set r storing the rank values of vertices. In V ′, vertices are ordered by
the rank values to ensure the vertices contained in more and large 𝛼-MCs are pro-
cessed early. During enumeration, Algorithm 2 computes 𝛼-MCs based on the index.
Note that, according to the rank value r(u) of vertex u, line 19 checks whether u sat-
isfies the pruning condition. If so, u is pruned. Such an index helps to prune vertices
to avoid redundant enumerations of the 𝛼-MCs that are not chosen.
Analysis. We notice that Algorithm 2 is similar to Algorithm 1. The difference

is the pruning condition in line 19, which takes constant time. Therefore, the time
complexity of Algorithm 2 is O(|V � | ⋅ 2|V | ).
�
13
19384 J. Bai et al.
5.1 Degree‑based index
Based on the fact that vertices with large degrees are more likely to be contained
in large 𝛼-MCs, we propose a degree-based index by sorting vertices in descending
order according to their degrees. With this index, vertices with large degrees can be
processed first such that larger 𝛼-MCs can be produced as early as possible.
Theorem 1 For a vertex u and a maximal clique C, if d(u) ≤ |C| − 1, it is impossible to

exist a larger maximal clique C′ than C containing u.
Proof of Theorem 1 Based on Definition 4, for all v in C, we have d(v, C) = |C| − 1

and d(v) ≥ |C| − 1. If there exists a maximal clique C′ containing u and |C′ | > |C|, we
have d(u) ≥ |C� | − 1 and d(u) > |C| − 1. Therefore, if d(u) ≤ |C| − 1, it is impossible
to exist a larger maximal clique C′ than C containing u. ◻
From Theorem 1, we can set the pruning condition Pruning (r(u)) as

d(u) ≤ Smallest(R).size − 1 (line 19 in Algorithm 2). The vertices that satisfy the
pruning condition will be pruned without missing any correct result.
Example 2 Considering Fig. 1a, firstly, the degree-based index is constructed by sorting
vertices in descending order according to their degrees, as shown as G′ in Fig. 1b. Dur-
ing enumeration, given 𝛼 = 0.3 and k = 2, Algorithm 2 computes 𝛼-MCs starting from
the first vertex v1 whose degree is 7 based on the degree-based index. Then, the first 2 𝛼
-MCs {v1 , v2 } and {v1 , v6 , v7 } are found and added into the result set R. Since there are
already 2 𝛼-MCs in R, it needs to check the following vertices whether they satisfy the
pruning condition. Since d(v8 ) > |{v1 , v2 }| − 1, as well as v9, the 𝛼-MC {v1 , v8 , v9 } is
found, which will replace the smallest one {v1 , v2 }. After checking whether v10 satisfies
the pruning condition, v10 is pruned since d(v10 ) ≤ |{v1 , v6 , v7 }| − 1. Similarly, v11 can
also be pruned. Next, 𝛼-MC {v2 , v3 , v4 , v5 } is found by computing from v2, and it will
replace {v1 , v6 , v7 }. Finally, we get the result set R = {{v2 , v3 , v4 , v5 }, {v1 , v8 , v9 }}.
The advantage of degree-based index is to prune the unpromising vertices of

small degrees to avoid enumerating numerous 𝛼-MCs that are not chosen. Compar-
ing Example 2 with Example 1, the degree-based index helps avoid enumerating
the 𝛼-MCs {v1 , v10 } and {v1 , v11 }, both of which contain the vertices satisfying the
pruning condition.
Analysis. The time complexity of the degree-based index construction is O(|V|)
due to counting sort. The degree-based index needs to store the degrees, new IDs,
edges and their probabilities; thus, the space complexity of the degree-based index
is O(|V| + |E|).
13
5.2 Core‑based index
Although the degree-based index can help avoid enumerating small 𝛼-MCs {v1 , v10 }
and {v1 , v11 }, considering Example 2, the not chosen 𝛼-MC {v1 , v6 , v7 } is produced ear-
lier than the largest one {v2 , v3 , v4 , v5 }. This is because v1 has a larger degree but is
contained in the smaller 𝛼-MCs. Vertices with large degrees may also be contained in
many small cliques. Degree lacks constraint on the tightness between the vertices of a
clique. Core number [43] is a more efficient metric. From Definition 8 and 9, we know
that a k-core is the maximal connected induced subgraph G� [V � ] in which each vertex
u has a degree d(u, G� ) of at least k. The larger the core number k of a vertex, the more
neighbors in its maximal connected induced subgraph. This means that such vertices
are more likely to be contained in large 𝛼-MCs. We propose a core-based index by sort-
ing vertices in descending order according to their core numbers. Note that, before sort-
ing, the core numbers need to be computed using the algorithm proposed by [43]. With
this index, vertices with large core numbers can be processed first such that larger 𝛼
-MCs can be produced as early as possible.
Theorem 2 For a vertex u and a maximal clique C, if core(u) ≤ |C| − 1, it is impossi-

ble to exist a larger maximal clique C′ than C containing u.
Proof of Theorem 2 If there exists a maximal clique C′ containing u and |C′ | > |C|, we
have d(u) ≥ |C� | − 1 and d(u) > |C| − 1. Based on Definition 8, C′ is a k-core, where
k = |C� | − 1. Based on Definition 9, for each vertex u in C′, since core(u) is the largest k
such that a k-core containing u, we have core(u) ≥ |C� | − 1 and then core(u) > |C| − 1.
Therefore, if core(u) ≤ |C| − 1, it is impossible to exist a larger maximal clique C′ than
C containing u. ◻
From Theorem 2, we can set the pruning condition Pruning (r(u)) as

core(u) ≤ Smallest(R).size − 1. The vertices that satisfy the pruning condition will be
pruned without missing any correct result.
Example 3 Considering Fig. 1a, firstly, the core-based index is constructed by com-
puting core number for each vertex, i.e., core = {2, 2, 2, 1, 1, 2, 2, 3, 3, 3, 3}. Then,
sort the vertices in descending order according to their core numbers, as shown as G′′
in Fig. 1c. During enumeration, given ap= 0.3 and k = 2, Algorithm 2 computes 𝛼-
MCs starting from the first vertex v1 whose core number is 3 based on the core-based
index. The first 2 𝛼-MCs {v1 , v2 , v3 , v4 } and {v1 , v7 } are found. Since there are already
2 𝛼-MCs in R, it needs to check the following vertices whether they satisfy the prun-
ing condition. Next, since core(v5 ) > |{v1 , v7 }| − 1, as well as v6 and v7, {v5 , v6 , v7 }
is found, which will replace {v1 , v7 }. After checking whether v7 satisfies the pruning
condition, v7 is pruned since core(v7 ) ≤ |{v5 , v6 , v7 }| − 1. Finally, we get the result set
R = {{v1 , v2 , v3 , v4 }, {v5 , v6 , v7 }}.
Comparing Example 3 with Example 2, we find that based on the degree-based

index, Algorithm 2 enumerates 4 𝛼-MCs, while based on the core-based index,
13
19386 J. Bai et al.
Algorithm 2 enumerates 3 𝛼-MCs. This proves that the core-based index is more effec-
tive than degree-based index because of it stronger constraint on the tightness between
the vertices of a clique.
Analysis. The core-based index construction is composed of two parts. One part is
computing core numbers, which takes O(|E|) time [43]. The other one is sorting, which
takes O(|V|) time by counting sort. Therefore, the time complexity of core-based index
construction is O(|V| + |E|). The core-based index needs to store the core numbers
and new IDs of vertices, edges and their probabilities, thus the space complexity of the
core-based index is O(|V| + |E|).
Correctness of Algorithm 2. Algorithm 2 employs the pruning strategy to prune
the unpromising vertices that are not contained in any larger 𝛼-MC than the current
smallest result based on Theorem 1 and Theorem 2. It means that we only drop the
unpromising results without missing any correct one.
6 Support‑based pruning
Considering Example 3, the not chosen 𝛼-MC {v1 , v7 } still be produced earlier than
many larger ones. This is because all the vertices in {v1 , v7 } have large core numbers
such that we cannot prune them by the pruning condition.
Theorem 3 For an edge (u, v), if sup(u,v) ≤ 1, it is impossible to exist a maximal clique
C containing (u, v) such that |C| > 3.
Proof of Theorem 3 If there exists a maximal clique C, for each edge (u, v) in C,
there are at least |C| − 2 common neighbors of u and v. Based on Definition 10, we
have sup(u,v) ≥ |C| − 2. If sup(u,v) ≤ 1, then |C| − 2 ≤ 1, i.e., |C| ≤ 3. Therefore, if
sup(u,v) ≤ 1, it is impossible to exist a maximal clique C containing (u, v) such that
|C| > 3. ◻
From Theorem 3, we propose an support-based pruning. Before constructing the

index, it deletes the edges with support values no larger than 1 to avoid enumerating
the MCs no larger than 3, such as independent edges and triangles. Such preprocessing
reduces the size of the input graph and further avoids enumerating the 𝛼-MCs that are
not chosen. For example, in Fig. 1a, it will delete the edges (v1 , v2 ), (v1 , v3 ), (v2 , v3 ),
(v3 , v4 ), (v3 , v5 ), (v3 , v6 ), (v3 , v7 ), (v3 , v8 ) and (v6 , v7 ). Obviously, the size of input graph
is reduced, and the 𝛼-MCs with sizes no larger than 3 will not be computed, such as
{v1 , v2 , v3 }, {v3 , v6 , v7 }, {v3 , v4 }, {v3 , v5 }, and {v3 , v8 }.
Specifically, Algorithm 3 first computes the support values by Function Support
(line 2) by computing the number of common neighbors of two vertices of each edge
(lines 4 to 7) according to Definition 10. Next, delete edges with support values no
larger than 3. Additionally, all the triangles containing the deleted edges are stored in
descending order according to their clique probabilities to supplement the results when
there are fewer than k results(lines 13 to 14). After the support-based pruning, the
13
Table 4 The support value of each edge in the original graph Fig 1a
Edge (v1 , v2 ) (v1 , v3 ) (v2 , v3 ) (v3 , v4 ) (v3 , v5 ) (v3 , v6 ) (v3 , v7 ) (v3 , v8 )
Support value 1 1 1 0 0 1 1 0
Edge (v6 , v7 ) (v8 , v9 ) (v8 , v10 ) (v8 , v11 ) (v9 , v10 ) (v9 , v11 ) (v10 , v11 )
Support value 1 2 2 2 2 2 2
degree-based index or core-based index is constructed. Finally, Algorithm 2 computes

the top k 𝛼-MCs based on the index.
Example 4 Support-based pruning on the degree-based index. Considering Fig. 1a,

firstly, the support value for each edge is computed as shown in Table 4. Next, delete
the edges with support values no larger than 1 to avoid enumerating the small 𝛼-MCs
with sizes no larger than 3. At the same time, the triangles {v3 , v6 , v7 } and {v1 , v2 , v3 }
are stored in descending order according to their clique probabilities. Then, the degree-
based index is constructed as shown in G′′′ in Fig. 1d. During the enumeration, given
𝛼 = 0.3 and k = 2, Algorithm 2 computes 𝛼-MCs starting from the first vertex v1 whose
degree is 3 based on G′′′. Then, the first 𝛼-MC {v1 , v2 , v3 , v4 } is found. Since there are
no other 𝛼-MCs, it needs to add a triangle with the most clique probability to the result
set R. Finally, we get the result set R = {{v1 , v2 , v3 , v4 }, {v7 , v10 , v11 }}.
Example 5 Support-based pruning on the core-based index. Considering Fig. 1a,

similar to Example 4, firstly, the support values are computed and the edges with sup-
port values no larger than 1 are deleted. Then, the core-based index is constructed as
G′′′ in Fig. 1d shows. During the enumeration, given 𝛼 = 0.3 and k = 2, Algorithm 2
13
19388 J. Bai et al.
computes 𝛼-MCs starting from the first vertex v1 whose core number is 3 based on G′′′.
Then, the first 𝛼-MC {v1 , v2 , v3 , v4 } is found. Since there are no other 𝛼-MCs, it needs
to add a triangle with the most clique probability to R. Finally, we get the result set
R = {{v1 , v2 , v3 , v4 }, {v7 , v10 , v11 }}.
Comparing Example 4 and Example 5 with Example 2 and Example 3, Algorithm 2

enumerates only 1 𝛼-MC based on the index with support-based pruning, while Algo-
rithm 2 enumerates 4 𝛼-MCs only based on the degree-based index and 3 𝛼-MCs only
based on the core-based index.
Analysis. For the support-based pruning, the time complexity is composed of two
parts. The first part is computing support values. Since line 6 takes dmax time, comput-
ing support values takes O(|E| ⋅ dmax ) time. The second part is deleting edges, which
takes O(|E|) time. Therefore, the time complexity is O(|E| ⋅ dmax ). Furthermore, the
support-based pruning requires additional storage for the triangles, thus the space com-
plexity is O(|V| + |E|).
7 Experiments
We conducted extensive experiments to evaluate the performance of our algorithms.

The compared algorithms include:
(1) Baseline (described in Sect. 4);

(2) KMC+ (approach of [42]);
(3) IKC_D (IKC based on the degree-based index);
(4) IKC_C (IKC based on the core-based index);
(5) IKC_D+ (IKC based on the degree-based index with support-based pruning);
(6) IKC_C+ (IKC based on the core-based index with support-based pruning).
All algorithms were implemented in C++ and compiled by Microsoft Visual Studio
2019. All the experiments were conducted on the operating system of Windows 10, 64
bits, with an Intel Core i5 2.80GHz CPU and 8GB DDR3-RAM.
Datasets. We used 20 datasets to evaluate the performance of the all algorithms. The
datasets include: email-Eu-core1, Email-Enron 1, Slashdot0811 1, Email-EuAll 1,
web-Google 1, 10cit-Patent 1, 05cit-Patent 1, WikiTalk 1, cit-Patents 1, amaze2,
arxiv3, yago4, citeseer5 05citeseerx 5, pubmed6, dbpedia7, anthra8, mtbrv 8, agro-
cyc 8, ecoo 8.
1
snap.stanford.edu/.
2
www.amaze.ulb.ac.be.
3
arxiv.org.
4
yago-knowledge.org/.
5
citeseer.ist.psu.edu.
6
pubmed.ncbi.nlm.nih.gov/.
7
www.dbpedia.org.
8
ecocyc.org.
13
Table 5 Statistics of datasets Dataset |V| |E| dmax

d
email-Eu-core 1,005 16,064 31.9 345

amaze 3,342 3,600 2.2 2,755
arxiv 6,000 66,707 22.2 700
yago 6,642 42,392 12.8 2,371
pubmed 9,000 40,028 8.9 432
mtbrv 9,602 10,245 2.1 4,004
citeseer 10,720 44,258 8.3 192
anthra 12,495 13,104 2.1 5,401
ecoo 12,620 13,350 2.1 5,435
agrocyc 12,684 13,408 2.1 5,486
Email-Enron 36,692 183,813 10.0 1,383
Slashdot0811 77,360 413,994 10.7 2,251
Email-EuAll 231,000 223,004 1.9 168,815
web-Google 371,764 517,805 2.8 243,794
10cit-Patent 1,097,775 1,651,894 3.0 85
05citeseerx 1,457,057 3,002,252 4.1 53,452
05cit-Patent 1,671,488 3,303,789 4.0 139
WikiTalk 2,281,879 2,311,570 2.0 2,249,742
dbpedia 3,365,623 7,989,191 4.7 1,991,836
cit-Patents 3,773,768 16,518,947 8.8 793
The dataset email-Eu-core was generated using email data from a large European
research institution and describes the communication between the core members;
amaze describes functional and physical interactions among biochemical entities;
arxiv is a free distribution service and an open-access archive for scholarly articles in
the fields of physics, mathematics, computer science, quantitative biology, quantitative
finance, statistics, electrical engineering and systems science, and economics; pubmed
describes citations for biomedical literature from MEDLINE, life science journals, and
online books; citeseer and 05citeseerx describe the scientific literature network in
the CiteSeerX library; anthra , mtbrv , agrocyc , and ecoo describe the network in
biology between the biochemistry and the genome; Email-Enron describes the e-mail
communication network of the American Enron company; Slashdot0811 describes
the user relationship network (friend/foe links) of the Slashdot website; web-Google
describes the links of web pages from Google; WikiTalk describes all the users and dis-
cussion from the inception of Wikipedia; dbpedia is a project aiming to extract struc-
tured content from the information created in the Wikipedia project and allows users
to semantically query relationships and properties of Wikipedia resources, including
links to other related datasets; cit-Patents describes US patent dataset network, which
includes the utility patents and citations; yago is a large knowledge base with general
knowledge about people, cities, countries, movies, and organizations.
These datasets represent relevant knowledge in different fields. Detailed statistics of
these datasets are summarized in Table 5, where |V|, |E|, d , and dmax denote the number
of vertices, the number of edges, the average of degrees, and the maximum of degrees.
13
19390 J. Bai et al.
7.1 Performance of enumeration
From Sect. 4, we know that the running time of Function Enum_Clique dominates the
whole performance. It means that the fewer cliques it computes the more efficiency.
Therefore, the evaluations include the number of cliques and the running time of
enumeration.
The number of cliques. Baseline needs to compute all the 𝛼-MCs. KMC+ needs to
compute all the MCs and then compute 𝛼-MCs. IKC only computes large 𝛼-MCs by the
pruning strategy based on an index.
Table 6 shows the number of cliques enumerated by all algorithms with the param-
eters 𝛼 = 0.3 and k = 15. Column 2 shows the number of 𝛼-MCs enumerated by Base-
line, and column 3 shows the number of their subgraphs enumerated by Baseline,
denoted by |m| and |s|, respectively. Column 4 shows the number of MCs and 𝛼-MCs
enumerated by KMC+, and column 5 shows the number of their subgraphs enumerated
by KMC+. Columns 6 and 7 show the numbers of 𝛼-MCs and the early terminated enu-
merations by IKC_D. Other columns are similar. Note that, the total number of cliques
computed by the algorithm is equal to the sum of |m| and |s|. However, the total number
of cliques is not equal to the fact total number of 𝛼-MCs in the graph. The reason is that
the |m| of Baseline is the fact total number of 𝛼-MCs in the graph as shown in Table 3,
and |s| is the redundant computation such as a clique that contained by a result 𝛼-MC.
Similarly, the |m| of KMC+ is the fact total number of the MCs and the not pruned 𝛼
-MCs, and |s| is the redundant computation. The |m| of IKC is the number of not pruned
𝛼-MCs, and |s| is the redundant computation including the cliques contained by the
results and the early terminated enumerations by its pruning strategy. Due to their dif-
ferent methods and pruning strategies, these three values (|m|, |s|, and the sum) of dif-
ferent algorithms are not equal.
Comparing IKC_D with Baseline, on 05citeseerx, 05cit-Patent, WikiTalk,
dbpedia, and cit-Patents, Baseline cannot finish within 12 hours. The total number
of cliques enumerated by IKC_D is on average about 20 times less than Baseline.
IKC_D is more efficient on sparse datasets, such as agrocyc, ecoo, and anthra, as
the total numbers are over 20 times less. Especially on WikiTalk and Email-EuAll,
the total numbers are, respectively, 376 and 45 times less. Comparing IKC_C with
Baseline, the total number of cliques enumerated by IKC_C is on average over 120
times less than Baseline. Similarly, IKC_C is more efficient on sparse datasets, such
as agrocyc, ecoo, anthra, and Email-EuAll, as the total numbers are over 70 times
less. Especially on WikiTalk, the total numbers are 2132 times less. The results dem-
onstrate that the index-based algorithm is more efficient than Baseline, especially
on sparse datasets, since the index-based index helps prune numerous vertices with
small degrees or core numbers.
Comparing IKC_D with KMC+, on dbpedia and cit-Patents, KMC+ cannot fin-
ish within 12 h. The total number of cliques enumerated by IKC_D is on average
about 10 times less than KMC+. IKC_D is more efficient on dense datasets, such as
email-Eu-core, Email-Enron, and Slashdot0811, as the total numbers are over 23
times less. This is because, for dense datasets, the MCs enumerated by KMC+ usu-
ally have large sizes and small probabilities, which means KMC+ needs to further
13
Table 6 The number of cliques that all algorithms enumerate, when 𝛼 = 0.3, k = 15
Dataset Baseline KMC+ IKC_D IKC_C IKC_D+ IKC_C+
|m| |s| |m| |s| |m| |s| |m| |s| |m| |s| |m| |s|
email-Eu-core 10,957 3647 44,112 522,499 26 13,854 33 13,738 26 13,411 33 13,293

amaze 2,978 2,892 201 457 28 379 30 139 15 2 15 1
arxiv 62,213 22,338 113,507 1,292,180 27 79,566 31 77,679 27 73,817 31 71,956
yago 31,662 15,294 32,928 52,243 15 39,780 28 38,589 15 22,613 26 21,928
pubmed 29,135 8,167 11,667 30,118 31 28,161 32 27,398 31 7,719 31 7,365
mtbrv 8,627 9,345 354 5,264 25 831 24 270 15 7 15 2
citeseer 32,634 12,296 17,933 61,124 29 34,652 38 32,566 29 18,865 38 17,297
anthra 10,992 12,141 356 7,493 30 781 30 257 15 3 15 1
ecoo 11,283 12,271 424 6,963 28 901 30 284 15 2 15 1
Index‑based top k α‑maximal‑clique enumeration over…
agrocyc 11,265 12,357 415 6,958 28 887 30 284 15 2 15 1

Email-Enron 176,997 94,034 635,649 5,702,687 75 177,671 76 167,697 60 173,093 76 165,182
Slashdot0811 294,906 62,522 478,370 6,034,908 32 279,834 30 266,752 32 88,014 30 86,532
Email-EuAll 176,514 79,139 7,053 26,804 15 5643 15 3,680 15 162 22 146
web-Google 381,589 211,250 115,292 219,366 22 97,402 30 39,343 23 24,650 30 16,222
10cit-Patent 1,294,268 603,660 85,925 712,725 39 649,031 45 121,658 42 13,619 34 6,773
05citeseerx – – 512,900 1,654,149 30 1,374,477 44 1,309,423 31 298,175 30 267,564
05cit-Patent – – 334,347 1,409,597 34 1,697,772 38 890,289 31 101,349 38 52,143
WikiTalk – – 30,790 62,148 15 10,895 28 1,898 24 121 21 105
dbpedia – – – – 30 3,826,585 60 2,999,112 30 1,619,363 45 1,535,497
cit-Patents – – – – 46 11,189,444 33 10,270,856 45 2,821,200 33 2,290,891
(m: 𝛼-MCs, s: subgraphs.)

19391
13
19392 J. Bai et al.
compute more 𝛼-MCs on them. Comparing IKC_C with KMC+, the total number of
cliques enumerated by IKC_C is about 17 times less than KMC+ on average. Espe-
cially on email-Eu-core, Email-Enron, WikiTalk, and anthra, the total numbers
IKC_C are over 24 times less. The results demonstrate that the index-based algo-
rithm is more efficient than KMC+.
Comparing IKC_C with IKC_D, the total number of cliques enumerated by
IKC_C is on average 2 times less than IKC_D. Especially on 10cit-Patent and Wiki-
Talk, the total numbers are over 5 times less. The results demonstrate that the core-
based index is more efficient than the degree-based index. Additionally, both IKC_C
and IKC_D have better performance on sparse datasets. For example, benefit from
the pruning condition, Email-EuAll is much larger than Slashdot0811 and Email-
Enron, but the sum of |m| and |s| is much less.
Next, we test the two optimized indexes, which perform additional support-based
pruning strategy. The total number of cliques enumerated by IKC_D+ is on average
20 times less than IKC_D, especially on sparse datasets, such as WikiTalk, 10cit-
Patent, and Email-EuAll, as the total numbers are, respectively, about 75, 47, and 31
times less. The total number of cliques enumerated by IKC_C+ is on average 9 times
less than IKC_C, especially on sparse datasets, such as Email-EuAll, 10cit-Patent,
and 05cit-Patent, as the total numbers are, respectively, about 22, 18, and 17 times
less. The efficiency of the support-based pruning mainly benefits from the reduced
sizes of input graphs. For example, Email-EuAll is much larger than Slashdot0811
and Email-Enron, but the sum of |m| and |s| is much less. This is because the sup-
port-based pruning deletes more than 99% of edges on Email-EuAll and 12 and 73%
on Slashdot0811 and Email-Enron, as shown in Table 7, where |Ed | denotes the
number of the deleted edges. The support-based pruning is more efficient on sparse
datasets because it deletes more than 90% of edges on sparse datasets. Additionally,
the results demonstrate that the support-based pruning on the degree-based index
is more obvious than that on the core-based index. This is because the core-based
index is already tightly constrained between the vertices of a clique.
The running time of enumeration. The running time of all algorithms with
𝛼 = 0.3 and k = 15 is shown in Table 8.
Comparing IKC_D with Baseline, IKC_D is on average 183 times faster. Espe-
cially on sparse datasets, such as Email-EuAll, IKC_D is much faster. Comparing
IKC_C with Baseline, IKC_C is on average 720 times faster. Especially on sparse
datasets, such as Email-EuAll, WikiTalk and web-Google, IKC_C is much faster.
The results demonstrate that IKC_D and IKC_C are more efficient than Baseline.
Comparing IKC_D with KMC+, IKC_D is on average 31 times faster. Especially on
sparse datasets, such as Email-EuAll and WikiTalk. Comparing IKC_C with KMC+,
IKC_C is on average 219 times faster. Especially on sparse datasets, such as Email-
EuAll and WikiTalk. The results demonstrate that IKC_D and IKC_C are more efficient
than KMC+.
Comparing IKC_C with IKC_D, IKC_C is on average 3.6 times faster. Especially
on 10cit-Patent and WikiTalk, IKC_C is over 10 times faster than IKC_D. The results
demonstrate that the core-based index is more efficient than the degree-based index.
Similar to the above, both IKC_C and IKC_D have better performance on sparse data-
sets due to the pruning condition, such as on Email-EuAll.
13
Table 7 The number of deleted Dataset |Ed | |Ed |∕|E| (%)

edges by the optimization
email-Eu-core 647 4.03
amaze 3,534 49.08
arxiv 6,948 10.42
yago 18,538 43.73
pubmed 29,673 74.13
mtbrv 10,129 98.87
citeseer 21,088 47.65
anthra 12,985 49.55
ecoo 13,224 99.06
agrocyc 13,291 49.56
Email-Enron 22,598 12.29
Slashdot0811 303,689 73.36
Email-EuAll 221,733 99.43
web-Google 438,782 84.74
10cit-Patent 1,602,814 97.03
05citeseerx 2,553,941 85.07
05cit-Patent 3,052,933 92.41
WikiTalk 2,308,171 99.85
dbpedia 5,511,486 68.99
cit-Patents 12,141,379 73.50
Comparing the two optimized indexes, which perform additional support-based

pruning strategy, IKC_D+ is on average 131 times faster than IKC_D. Especially on
sparse datasets, such as 10cit-Patent and 05cit-Patent, IKC_D+ is more than 1415
and 225 times faster respectively. IKC_C+ is on average 55 times faster than IKC_C.
Especially on sparse datasets, such as 10cit-Patent and 05cit-Patent, IKC_D+ is more
than 341 and 309 times faster respectively. Similar to the above, both IKC_C+ and
IKC_D+ have better performance on sparse datasets due to the support-based pruning.
Additionally, these results show that the support-based pruning on the degree-based
index is more obvious than that on the core-based index. The reason has been analyzed
previously.
Note that, comparing KMC+ with Baseline, although the total number of cliques
enumerated by KMC+ is more than Baseline, KMC+ is still about 10 times faster on
average. The reason is that the total times of calling Function Enum_Clique by KMC+
includes two parts. The one is used to compute MCs, which do not compute probabili-
ties. The other one is used to compute 𝛼-MCs on the MCs, which do not compute com-
mon neighbors. Additionally, on the dense datasets, such as email-Eu-core and arxiv,
KMC+ is slower than Baseline. This is because on dense datasets, the MCs enumerated
by KMC+ usually have large sizes and small probabilities, which means KMC+ needs
to further compute more 𝛼-MCs on them.
Vary 𝛼. The running time of enumeration for all algorithms when fixing k = 10
and varying 𝛼 from 0.1 to 0.9 on Slashdot0811 is shown in Fig. 2a. The running time
shows a similar trend on the other datasets. We can find that, for all algorithms, the
13
19394 J. Bai et al.
Table 8 Running time (ms) of all algorithms, when 𝛼 = 0.3, k = 15

Dataset Baseline KMC+ IKC_D IKC_C IKC_D+ IKC_C+
email-Eu-core 66 441 8 8 8 8
amaze 36 7 3 1 1 1
arxiv 257 829 57 54 45 44
yago 182 71 15 22 4 9
pubmed 298 75 33 23 6 5
mtbrv 314 33 5 2 1 1
citeseer 367 103 48 41 15 13
anthra 439 63 5 3 1 1
ecoo 472 53 4 2 1 1
agrocyc 554 64 5 3 1 1
Email-Enron 11,890 4,531 263 211 203 180
Slashdot0811 63,330 7,610 1,597 1,201 99 96
Email-EuAll 662,834 21,202 347 99 8 10
web-Google 1,873,600 59,007 8,247 1,010 114 42
10cit-Patent 19,923,600 775,730 225,111 14,012 159 41
05citeseerx – 1,463,166 171,811 158,411 4,422 3,517
05cit-Patent – 2,252,258 706,232 243,001 3,134 785
WikiTalk – 3,108,793 9,592 912 179 178
dbpedia – – 1,636,640 634,398 21,501 12,377
cit-Patents – – 5,962,600 4,632,810 462,358 230,500
running time decreases as 𝛼 grows because the number of cliques decreases. Note that,
KMC+ shows a different trend on dense datasets such as Slashdot0811 from sparse
datasets such as yago in [42]. This is because on dense datasets the MCs usually have
large sizes and small probabilities such that KMC+ must further compute more 𝛼-MCs
on them. However, on sparse datasets the MCs usually have large probabilities such that
most of them can avoid further computations when 𝛼 is small.
IKC_C+ and IKC_D+ are much faster than IKC_C and IKC_D for every 𝛼 and the
gap between them gradually increases as 𝛼 grows. The first reason is that the size of
input graph is reduced after the support-based pruning, which reduces the running time.
The second reason is that the input graph is denser after support-based pruning such
that the number of cliques with probabilities no less than 𝛼 decreases as 𝛼 increases.
Vary k . The running time of enumeration for all algorithms when fixing 𝛼 = 0.3
and varying k from 10 to 300 on Slashdot0811 is shown in Fig. 2b. The running time
shows a similar trend on the other datasets. We can find that, for IKC, the running time
increases smoothly as k grows. This is because as k grows the pruning condition is trig-
gered later, resulting in fewer pruning vertices. Note that, the later triggering has little
effect on the pruning. While, Baseline takes the same time for each k because it needs
to compute all the 𝛼-MCs.
Scalability. In this experiment, we test the scalability of IKC_D, IKC_C, IKC_D+,
and IKC_C+ on Slashdot0811. Specifically, we generate four subgraphs by randomly
sampling 20-80% of the edges and vertices, respectively. We test the running time of
13
(a) (b)
Fig. 2 Running time (ms) on Slashdot0811, when a different 𝛼 fixing k = 10, b different k fixing
𝛼 = 0.3
four algorithms with the parameters fixing on 𝛼 = 0.3 and k = 15, as shown in Fig. 3.
From the results we find that, the four algorithms show near-linear scalability with
a varying |E| or |V|. Especially, IKC_D+ and IKC_C+ have a much smoother slope.
For example, when we sample 20% vertices, the running time of IKC_D and IKC_C
is about 40 ms, and that of IKC_D+ and IKC_C+ is about 10 ms. However, when we
sample 80% vertices, the running time of IKC_D and IKC_C is about 700 ms, and that
of IKC_D+ and IKC_C+ is only about 50 ms. These results show that the four algo-
rithms are scalable in practice, and IKC_D+ and IKC_C+ have better scalability than
IKC_D and IKC_C.
7.2 Performance of index construction
Index size. As Sects. 5 and 6 describe, the degree-based index needs to store the vertex
degrees, new IDs, edges and their probabilities. The core-based index needs to store
(a) (b)
Fig. 3 Comparison of scalability. (Slashdot0811, 𝛼 = 0.3, k = 15)
13
19396 J. Bai et al.
Table 9 The sizes (MB) of four Dataset I_D I_C I_D+ I_C+
indexes
email-Eu-core 0.192 0.192 0.190 0.190
amaze 0.108 0.108 0.071 0.071
arxiv 0.809 0.809 0.791 0.791
yago 0.536 0.536 0.585 0.585
pubmed 0.527 0.527 0.305 0.305
mtbrv 0.191 0.191 0.081 0.081
citeseer 0.588 0.588 0.484 0.484
anthra 0.395 0.395 0.253 0.253
ecoo 0.249 0.249 0.106 0.106
agrocyc 0.404 0.404 0.259 0.259
Email-Enron 2.384 2.384 2.214 2.214
Slashdot0811 5.328 5.328 2.902 2.902
Email-EuAll 4.314 4.314 1.910 1.910
web-Google 8.762 8.762 5.605 5.605
10cit-Patent 27.280 27.280 10.410 10.410
05citeseerx 45.475 45.475 22.370 22.370
05cit-Patent 50.561 50.561 20.777 20.777
WikiTalk 43.863 43.863 18.035 18.035
dbpedia 117.107 117.107 100.628 100.628
cit-Patents 217.836 217.836 122.030 122.030
the core numbers, new IDs, edges and their probabilities. The two optimized indexes,
which additionally perform support-based pruning, require additional storage to store
triangles and their clique probabilities. Note that, after the support-based pruning, the
optimized indexes require less storage for storing edges and their clique probabilities.
The sizes of the four indexes are shown in Table 9, where I_D denotes the degree-
based index, I_C denotes the core-based index, I_D+ and I_C+ denote the optimized
indexes, respectively.
Construction time. As Sects. 5 and 6 describe, the construction time of the
degree-based index depends on the time of sorting; that of core-based index includes
the time of computing the core numbers and sorting; that of the optimized indexes
needs additional time to compute the support values and delete the edges. The con-
struction time for four indexes is shown in Table 10.
Comparing column 3 with 2, the core-based index needs to compute the core
numbers before sorting; thus, the time of the core-based index construction is more
than that of the degree-based index. However, on sparse datasets, such as Email-
EuAll, web-Google, 10cit-Patent, and WikiTalk, the construction time of the
core-based index is less than that of the degree-based index. This is because for
sparse datasets, core numbers are more concentrated than degrees, which makes the
counting sort on the core-based index more efficient than that on the degree-based
index. The comparison between columns 5 and 4 is similar. Comparing column 4
with 2 and column 5 with 3, the optimization needs additional computing support
values and deleting edges, thus the construction time of optimized indexes is more
13
Table 10 The time (ms) of Dataset I_D I_C I_D+ I_C+

construction for four indexes
email-Eu-core 2 2 15 16
amaze 1 1 19 19
arxiv 6 7 45 51
yago 4 4 39 41
pubmed 4 4 19 19
mtbrv 2 1 55 57
citeseer 4 5 18 18
anthra 2 1 94 93
ecoo 2 2 102 105
agrocyc 2 2 89 90
Email-Enron 18 18 168 188
Slashdot0811 45 48 508 561
Email-EuAll 35 24 88,054 77,419
web-Google 115 86 148,029 139,080
10cit-Patent 312 251 451 419
05citeseerx 431 432 6,648 6,778
05cit-Patent 579 706 1,168 1,192
WikiTalk 399 256 10,603,100 10,395,250
dbpedia 1,441 1,338 12,493,000 12,359,700
cit-Patents 2,368 4,068 9,898 10,328
than that of two basic indexes. Note that, the time cost of IKC_D+ and IKC_C+
of Email-EuAll, web-Google, WikiTalk, and dbpedia is much longer than oth-
ers. This is because their maximum degrees are much larger than others as Table 5
shows, and the time complexity of support-based pruning is O(|E| ⋅ dmax ) as Sec-
tion 6 describes.
8 Conclusion
In this paper, we study the top k 𝛼-MC enumeration over uncertain graphs. We first
propose an index-based algorithm, namely IKC, which computes top k 𝛼-MCs based
on an index to improve the efficiency by pruning unpromising vertices to avoid enu-
merating the 𝛼-MCs that are not chosen. We propose a degree-based index and a
core-based index, which sort vertices according to their degrees or core numbers.
Such ordering contribute to producing large 𝛼-MCs as early as possible to prune
more vertices by the pruning condition. We also propose a support-based pruning
by deleting the edges with support values no larger than 1 to further reduce the size
of the input graph to speed up the enumeration. Our experimental results on 20 real-
world datasets show that IKC based on the degree-based index is on average 183
times faster than Baseline, and 31 times faster than the state-of-the-art algorithm.
IKC based on the core-based index is on average 720 times faster than Baseline,
and 219 times faster than the state-of-the-art algorithm. Furthermore, IKC based on
13
19398 J. Bai et al.
the support-based pruning is on average 131 times faster than that only based on the
degree-based index, and IKC based on the index with support-based pruning is on
average 55 times faster than that without support-based pruning.
Our algorithms have better performance on sparse graphs because our index and
pruning strategy can help to avoid processing the unpromising vertices and comput-
ing the not chosen 𝛼-MCs. However, for dense graphs, there are very few vertices
with small degrees or core numbers, such that our pruning strategy is not effective
as sparse graphs. In the future, we plan to further study this problem to find better
strategy for dense graphs.
Acknowledgements This work was partly supported by grants from the Natural Science Foundation
of Shanghai (No. 20ZR1402700) and from the Natural Science Foundation of China (No.: 61472339,
61873337).
References
1. Gibson D, Kumar R, Tomkins A (2005) Discovering large dense subgraphs in massive graphs. In:
Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway,
August 30 - September 2, 2005, pp. 721–732 http://www.vldb.org/archives/website/2005/program/
paper/thu/p721-gibson.pdf
2. Qin L, Li R, Chang L, Zhang C (2015) Locally densest subgraph discovery. In: Proceedings of the 21th
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW,
Australia, August 10-13, 2015, pp. 965–974 https://doi.org/10.1145/2783258.2783299
3. Modani N, Dey K (2009) Large maximal cliques enumeration in large sparse graphs. In: Chawla, S., Kar-
lapalem, K., Pudi, V. (eds.) Proceedings of the 15th International Conference on Management of Data,
December 9-12, 2009, International School of Information Management, Mysore, India. Computer
Society of India, India. http://www.cse.iitb.ac.in/%7Ecomad/2009/proceedings/R3_1.pdf
4. Makino K, Uno T (2004) New algorithms for enumerating all maximal cliques. In: Hagerup, T., Kata-
jainen, J. (eds.) Algorithm Theory - SWAT 2004, 9th Scandinavian Workshop on Algorithm Theory,
Humlebaek, Denmark, July 8-10, 2004, Proceedings. Lecture Notes in Computer Science, vol. 3111,
pp. 260–272. Springer, Germany. https://doi.org/10.1007/978-3-540-27810-8_23
5. Mukherjee AP, Xu P, Tirthapura S (2015) Mining maximal cliques from an uncertain graph. In: 31st IEEE
International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, April 13-17, 2015, pp.
243–254. https://doi.org/10.1109/ICDE.2015.7113288
6. Zou Z, Li J, Gao H, Zhang S (2010) Finding top-k maximal cliques in an uncertain graph. In: Proceedings of
the 26th International Conference on Data Engineering, ICDE 2010, March 1-6, 2010, Long Beach, Cali-
fornia, USA, pp. 649–652 https://doi.org/10.1109/ICDE.2010.5447891
7. Li R, Dai Q, Wang G, Ming Z, Qin L, Yu JX (2019) Improved algorithms for maximal clique search in uncer-
tain networks. In: 35th IEEE International Conference on Data Engineering, ICDE 2019, Macao, China,
April 8-11, 2019, pp. 1178–1189. IEEE, New York https://doi.org/10.1109/ICDE.2019.00108
8. Rashid A, Kamran M, Halim Z (2019) A top down approach to enumerate𝛼-maximal cliques in uncertain
graphs. J Intell Fuzzy Syst 36(4):3129–3141. https://doi.org/10.3233/JIFS-18263
9. Márquez R, Weber R (2019) Overlapping community detection in static and dynamic social networks. In: Pro-
ceedings of the Twelfth ACM International Conference on Web Search and Data Mining, WSDM 2019,
Melbourne, VIC, Australia, February 11-15, 2019, pp. 822–823 https://doi.org/10.1145/3289600.3291602
10. Xiao D, Du N, Wu B, Wang B (2007) Community ranking in social network. In: Proceeding of the Second
International Multi-Symposium of Computer and Computational Sciences (IMSCCS 2007), August 13-15,
2007, The University of Iowa, Iowa City, Iowa, USA, pp. 322–329. IEEE Computer Society, Los Alamitos,
CA, USA https://doi.org/10.1109/IMSCCS.2007.31
11. Meena J, Devi VS (2015) Overlapping community detection in social network using disjoint community
detection. In: IEEE Symposium Series on Computational Intelligence, SSCI 2015, Cape Town, South
Africa, December 7-10, 2015, pp. 764–771 https://doi.org/10.1109/SSCI.2015.114
13
12. Bai L, Cheng X, Liang J, Guo Y (2017) Fast graph clustering with a new description model for community
detection. Inf Sci 388:37–47. https://doi.org/10.1016/j.ins.2017.01.026
13. Arab M, Afsharchi M (2014) Community detection in social networks using hybrid merging of sub-com-
munities. J Netw Comput Appl 40:73–84. https://doi.org/10.1016/j.jnca.2013.08.008
14. Newman M, EJ (2001)The structure of scientific collaboration networks. Proceedings of the National Acad-
emy of Sciences of the United States of America
15. Alduaiji N, Datta A, Li J (2018) Influence propagation model for clique-based community detection in
social networks. IEEE Trans Comput Soc Syst 5(2):563–575. https://doi.org/10.1109/TCSS.2018.2831694
16. Cheng J, Ke Y, Fu AW, Yu JX, Zhu L (201) Finding maximal cliques in massive networks by h*-graph. In:
Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010,
Indianapolis, Indiana, USA, June 6-10, 2010, pp. 447–458 https://doi.org/10.1145/1807167.1807217
17. Chen W, Wang Y, Yang S (2009) Efficient influence maximization in social networks. In: Proceedings
of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris,
France, June 28 - July 1, 2009, pp. 199–208 https://doi.org/10.1145/1557019.1557047
18. Pathak N, Mane S, Srivastava J (2006) Who thinks who knows who? Socio-cognitive analysis of email
networks. In: Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006), 18-22
December 2006, Hong Kong, China, pp. 466–477 https://doi.org/10.1109/ICDM.2006.168
19. Hooi B, Song HA, Beutel A, Shah N, Shin K, Faloutsos C (2016) FRAUDAR: Bounding graph fraud in the
face of camouflage. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, pp. 895–904 https://doi.org/10.
1145/2939672.2939747
20. Bron C, Kerbosch J (1973) Finding all cliques of an undirected graph (algorithm 457). Commun ACM
16(9):575–576
21. Tomita E, Tanaka A, Takahashi H (2006) The worst-case time complexity for generating all maximal
cliques and computational experiments. Theor Comput Sci 363(1):28–42. https://doi.org/10.1016/j.tcs.
2006.06.015
22. Eppstein D, Löffler M, Strash D (2013) Listing all maximal cliques in large sparse real-world graphs. ACM
J Exp Algorithmics. https://doi.org/10.1145/2543629
23. Aggarwal CC (2009) Managing and Mining Uncertain Data. Advances in Database Systems, vol. 35. Klu-
wer, Netherlands https://doi.org/10.1007/978-0-387-09690-2
24. Adar E, Ré C (2007) Managing uncertainty in social networks. IEEE Data Eng Bull 30(2):15–22
25. Liben-Nowell D, Kleinberg JM (2007) The link-prediction problem for social networks. J Assoc Inf Sci
Technol 58(7):1019–1031. https://doi.org/10.1002/asi.20591
26. Liben-Nowell D, Kleinberg JM (2003) The link prediction problem for social networks. In: Proceedings
of the 2003 ACM CIKM International Conference on Information and Knowledge Management, New
Orleans, Louisiana, USA, November 2-8, 2003, pp. 556–559 https://doi.org/10.1145/956863.956972
27. Boldi P, Bonchi F, Gionis A, Tassa T (2012) Injecting uncertainty in graphs for identity obfuscation. Proc
VLDB Endow 5(11):1376–1387
28. Kuter U, Golbeck J (2010) Using probabilistic confidence models for trust inference in web-based social
networks. ACM Trans Internet Techn 10(2):8–1823. https://doi.org/10.1145/1754393.1754397
29. Mehmood Y, Bonchi F, García-Soriano D (2016) Spheres of influence for more effective viral marketing.
In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016,
San Francisco, CA, USA, June 26 - July 01, 2016, pp. 711–726 https://doi.org/10.1145/2882903.2915250
30. Kawahigashi H, Terashima Y, Miyauchi N, Nakakawaji T (2005) Modeling ad hoc sensor networks using
random graph theory. In: 2nd IEEE Consumer Communications and Networking Conference, CCNC 2005,
Las Vegas, NV, USA, January 3-6, pp. 104–109 (2005). https://doi.org/10.1109/CCNC.2005.1405152
31. Yang B, Wen D, Qin L, Zhang Y, Chang L, Li R (2019) Index-based optimal algorithm for computing
k-cores in large uncertain graphs. In: 35th IEEE International Conference on Data Engineering, ICDE
2019, Macao, China, April 8-11, 2019, pp. 64–75. IEEE, New York https://doi.org/10.1109/ICDE.2019.
00015
32. Abu-khzam FN, Baldwin NE, Langston MA, Samatova NF (2005) On the relative efficiency of maximal
clique enumeration algorithms, with application to high-throughput. In: Computational Biology, Proceed-
ings, International Conference on Research Trends in Science and Technology
33. Koch I, Lengauer T, Wanke E (1996) An algorithm for finding maximal common subtopologies in a set of
protein structures. J Comput Biol 3(2):289–306. https://doi.org/10.1089/cmb.1996.3.289
34. Saha B, Hoch A, Khuller S, Raschid L, Zhang X (2010) Dense subgraphs with restrictions and applications
to gene annotation graphs. In: Research in Computational Molecular Biology, 14th Annual International
13
19400 J. Bai et al.
Conference, RECOMB 2010, Lisbon, Portugal, April 25-28, 2010. Proceedings, pp. 456–472 https://doi.
org/10.1007/978-3-642-12683-3_30
35. Rual JF, Venkatesan K, Tong H, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze
M, Ayivi-Guedehoussou N (2005) Towards a proteome-scale map of the human protein-protein interaction
network. Nature 437(7062):1173–8
36. Wang Weidong (2007) Emergence of a dna-damage response network consisting of fanconi anaemia and
brca proteins. Nat Rev Genet 8(10):735
37. Lu Y, Huang R, Huang D (2019) Mining highly reliable dense subgraphs from uncertain graphs. KSII
Trans Internet Inf Syst 13(6):2986–2999. https://doi.org/10.3837/tiis.2019.06.012
38. Yuan L, Qin L, Lin X, Chang L, Zhang W (2015) Diversified top-k clique search. In: 31st IEEE Interna-
tional Conference on Data Engineering, ICDE 2015, Seoul, South Korea, April 13-17, 2015, pp. 387–398
(2015). https://doi.org/10.1109/ICDE.2015.7113300
39. Sanei-Mehri S, Das A, Tirthapura S (2018) Enumerating top-k quasi-cliques. In: IEEE International Con-
ference on Big Data, Big Data 2018, Seattle, WA, USA, December 10-13, 2018, pp. 1107–1112 https://doi.
org/10.1109/BigData.2018.8622352
40. Wu J, Li C, Jiang L, Zhou J, Yin M (2020) Local search for diversified top-k clique search problem. Com-
put Op Res 116:104867. https://doi.org/10.1016/j.cor.2019.104867
41. Zhao-Nian Z, Rong Z (2013) Mining top-k maximal cliques from large uncertain graphs. In: 2013 Chinese
Jounrnal of Computers, pp. 2146–2155. CCF
42. Bai J, Zhou J, Du M, Zhong P (2021) Efficient (k, 𝛼)-maximal-cliques enumeration over uncertain graphs.
IEEE Access 9:149338–149348. https://doi.org/10.1109/ACCESS.2021.3125198
43. Batagelj V, Zaversnik M (2003) An o(m) algorithm for cores decomposition of networks. Comput Sci
1(6):34–37
44. Che Y, Lai Z, Sun S, Wang Y, Luo Q (2020) Accelerating truss decomposition on heterogeneous proces-
sors. Proc VLDB Endow 13(10):1751–1764
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
13

Index Based Top K Uncertain Graphs

Uploaded by

Copyright:

Available Formats

You might also like

Index Based Top K Uncertain Graphs

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Index Based Top K Uncertain Graphs

Uploaded by

Copyright:

Available Formats

The Journal of Supercomputing (2022) 78:19372–19400

Index‑based top k α‑maximal‑clique enumeration

Jing Bai1,2 · Junfeng Zhou1 · Ming Du1 · Ziyang Chen3

Accepted: 15 May 2022 / Published online: 20 June 2022

Ming Du and Ziyang Chen have equally contributed to this work.

Keywords Uncertain graph · Maximal clique · Top k maximal cliques

• We propose an index-based top k 𝛼-MC enumeration algorithm, namely IKC,

Table 1 The summary of notations

Organization. The rest of this paper is summarized as follows. Section 2 provides

Definition 1 (Uncertain graph) An uncertain graph G = (V, E, 𝛽) is an undirected

Definition 5 (Clique probability) Given an uncertain graph G = (V, E, 𝛽 ), for any

For example, in Fig. 1a, let C = {v1 , v2 , v3 }, then clq(C, G) = 0.576.

Definition 6 (𝛼-clique) Given an uncertain graph G = (V, E, 𝛽) and a probability

For example, in Fig. 1a, if 𝛼 = 0.5, then C1 = {v1 , v2 , v3 } is an 𝛼-clique because

Definition 7 (𝛼-MC) Given an 𝛼-clique C, we say C is an 𝛼-MC, if there is no v ∈ V ⧵ C

For example, in Fig. 1a, if 𝛼 = 0.5, then C1 = {v1 , v2 , v3 } is an 𝛼-MC, but

Top k MC enumeration on uncertain graphs. The concept of probability top k

Top k MC enumeration on deterministic graphs. In 2015, Yuan et al. [38]

Example 1 Considering Fig. 1a, given 𝛼 = 0.3 and k = 2, Algorithm 1 enumerates 𝛼

Table 2 The running status of C A X

Table 3 The number of 𝛼-MCs in real-world datasets, when 𝛼 = 0.3

email-Eu-core 10,957 4:95 3:5,367 2:5,495

Enum_Clique is no more than O(2|V| ). In conclusion, the time complexity of Algo-

Analysis. We notice that Algorithm 2 is similar to Algorithm 1. The difference

Theorem 1 For a vertex u and a maximal clique C, if d(u) ≤ |C| − 1, it is impossible to

Proof of Theorem 1 Based on Definition 4, for all v in C, we have d(v, C) = |C| − 1

From Theorem 1, we can set the pruning condition Pruning (r(u)) as

The advantage of degree-based index is to prune the unpromising vertices of

Theorem 2 For a vertex u and a maximal clique C, if core(u) ≤ |C| − 1, it is impossi-

From Theorem 2, we can set the pruning condition Pruning (r(u)) as

Comparing Example 3 with Example 2, we find that based on the degree-based

From Theorem 3, we propose an support-based pruning. Before constructing the

degree-based index or core-based index is constructed. Finally, Algorithm 2 computes

Example 4 Support-based pruning on the degree-based index. Considering Fig. 1a,

Example 5 Support-based pruning on the core-based index. Considering Fig. 1a,

Comparing Example 4 and Example 5 with Example 2 and Example 3, Algorithm 2

We conducted extensive experiments to evaluate the performance of our algorithms.

(1) Baseline (described in Sect. 4);

Table 5 Statistics of datasets Dataset |V| |E| dmax

email-Eu-core 1,005 16,064 31.9 345

email-Eu-core 10,957 3647 44,112 522,499 26 13,854 33 13,738 26 13,411 33 13,293

agrocyc 11,265 12,357 415 6,958 28 887 30 284 15 2 15 1

(m: 𝛼-MCs, s: subgraphs.)

Table 7 The number of deleted Dataset |Ed | |Ed |∕|E| (%)

Comparing the two optimized indexes, which perform additional support-based

Table 8 Running time (ms) of all algorithms, when 𝛼 = 0.3, k = 15

7.2 Performance of index construction

Table 10 The time (ms) of Dataset I_D I_C I_D+ I_C+

You might also like

7.2 Performance of index construction