Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Bioinformatics, 31(14), 2015, 2409–2411

doi: 10.1093/bioinformatics/btv161
Advance Access Publication Date: 19 March 2015
Applications Note

Systems biology

MAGNA11: Maximizing Accuracy in Global


Network Alignment via both node and edge

Downloaded from https://academic.oup.com/bioinformatics/article/31/14/2409/256266 by guest on 12 October 2021


conservation
V. Vijayan1, V. Saraph2 and T. Milenković1,*
1
Department of Computer Science and Engineering, ECK Institute for Global Health, Interdisciplinary Center for
Network Science and Application, University of Notre Dame, IN 46556, USA and 2Department of Computer
Science, Brown University, Providence, RI 02912, USA
*To whom correspondence should be addressed.
Associate Editor: Jonathan Wren
Received on November 20, 2014; revised on January 31, 2015; accepted on March 14, 2015

Abstract
Motivation: Network alignment aims to find conserved regions between different networks.
Existing methods aim to maximize total similarity over all aligned nodes (i.e. node conservation).
Then, they evaluate alignment quality by measuring the amount of conserved edges, but only after
the alignment is constructed. Thus, we recently introduced MAGNA (Maximizing Accuracy in
Global Network Alignment) to directly maximize edge conservation while producing alignments
and showed its superiority over the existing methods. Here, we extend the original MAGNA with
several important algorithmic advances into a new MAGNAþþ framework.
Results: MAGNAþþ introduces several novelties: (i) it simultaneously maximizes any one of three
different measures of edge conservation (including our recent superior S3 measure) and any
desired node conservation measure, which further improves alignment quality compared with
maximizing only node conservation or only edge conservation; (ii) it speeds up the original
MAGNA algorithm by parallelizing it to automatically use all available resources, as well as by
reimplementing the edge conservation measures more efficiently; (iii) it provides a friendly graph-
ical user interface for easy use by domain (e.g. biological) scientists; and (iv) at the same time,
MAGNAþþ offers source code for easy extensibility by computational scientists.
Availability and implementation: http://www.nd.edu/cone/MAGNAþþ/
Contact: tmilenko@nd.edu

1 Introduction (Liao et al., 2009; Kuchaiev et al., 2010; Milenković et al., 2010;
Proteins produced by genes interact to carry out cellular processes, Kuchaiev and Pržulj, 2011; Patro and Kingsford, 2012; Faisal et al.,
which can be modeled by protein–protein interaction (PPI) net- 2014; Saraph and Milenković, 2014; Sun et al., 2014).
works. PPI network alignment can be used to find a node mapping Traditionally, existing methods first compute pairwise node simi-
between networks of different species that identifies similar regions larities between networks and then they find a high-scoring align-
between the networks (Sharan and Ideker, 2006; Clark and Kalita, ment that maximizes (greedily or optimally) the total similarity over
2014). Consequently, it can be used to predict protein function by all aligned nodes (or node conservation) (Faisal et al., 2014;
transferring knowledge from the network of a well-studied species Crawford et al., 2014). However, alignment quality is then evaluated
to the network of a poorly studied species, or to reconstruct species’ with respect to a measure of edge conservation. Thus, traditional
phylogenetic relationships based on similarities of their networks methods aim to conserve edges by aligning nodes that are similar.

C The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
V 2409
2410 V.Vijayan et al.

In contrast, our recent MAGNA (Maximizing Accuracy in the accuracy of the alignment in terms of node correctness (NC)
Global Network Alignment) (Saraph and Milenković, 2014) directly (Fig. 1) (Saraph and Milenković, 2014).
maximizes edge conservation while producing alignments. MAGNA
was shown to outperform the existing methods, including IsoRank
2.2 Speedup via parallelization and faster calculation of
(Singh et al., 2007), MI-GRAAL (Kuchaiev and Pržulj, 2011) and
GHOST (Patro and Kingsford, 2012), in terms of both node and
edge conservation
The original MAGNA is a genetic algorithm that combines existing
edge conservation. Importantly, in addition to constructing its own
‘parent’ alignments into superior ‘children’ alignments and then
superior alignments from scratch, MAGNA can combine alignments
evolves this process over multiple generations. Its main computa-
of existing methods to further improve them.
tional bottleneck is the calculation of quality for each alignment in
As simultaneously maximizing both node and edge conservation
each generation, which is needed in order to select the highest-
could further improve alignment quality (Neyshabur et al., 2013;
scoring alignments for the next generation. As quality of each align-
Crawford and Milenković, 2014; Sun et al., 2014), here we extend
ment can be calculated independently, unlike the original single-
MAGNA into a new MAGNAþþ framework, which (i) allows for

Downloaded from https://academic.oup.com/bioinformatics/article/31/14/2409/256266 by guest on 12 October 2021


thread MAGNA, MAGNAþþ divides the calculation among mul-
maximizing a combination of any node conservation (i.e. similarity)
tiple threads in a way that achieves a dramatic (up to linear) speedup
measure and any one of three currently implemented edge conserva-
(Fig. 2a). Parallelizing MAGNAþþ required changes in the imple-
tion measures; (ii) parallelizes the original MAGNA algorithm via
mentation of the original MAGNA in order to account for data
multi-threading and reimplements MAGNA’s edge conservation
structures that are appropriate for achieving the speedup. Namely,
measures more efficiently to decrease running time; (iii) implements
using adjacency matrices rather than adjacency lists is faster when
a friendly graphical user interface (GUI) for easy use by domain sci-
dealing with a single thread. However, we found that simply paral-
entists; and (iv) makes available the source code for easy extensibil-
lelizing the adjacency matrix-based implementation did not realize
ity by computational scientists.
high speedup, because of non-trivial cache contention issues with
adjacency matrices when using multiple threads. Thus, to achieve
high enough speedup with multiple threads (Fig. 2a), MAGNAþþ
implementation instead uses adjacency lists.
2 MAGNA11 details We achieve additional speedup. Time complexity of calculating
the edge-based alignment quality measures in the original MAGNA,
2.1 Simultaneous node and edge conservation
by iteratively exploring jE1 j edges in the smaller network and a sub-
Given node and edge conservation measures SN and SE, respectively,
set of jE2 j edges in the larger network that participate in the aligned
MAGNAþþ maximizes aSE þ ð1  aÞSN , where a is a parameter
nodes, is OðjE1 jlog jE2 jÞ (Saraph and Milenković, 2014). In
between 0 and 1 that controls for the contribution of each of the
MAGNAþþ, we formulate a new and faster method to calculate
two measures. Importantly, when we repeat the same analyses as in
this. Namely, we create a composite graph on jE1 j edges from the
the original MAGNA paper, we show that maximizing both node
smaller network and the same subset of jE2 j edges from the larger
and edge conservation always improves both node and edge align-
network and then simply count the number of conserved edges (Fig.
ment quality compared with optimizing only node conservation
2b). This decreases the above time complexity to OðjE1 j þ jE2 jÞ. We
(as the existing methods do) or only edge conservation (as the ori-
validate this speedup empirically as well (Fig. 2c).
ginal MAGNA does) (Fig. 1). Here, we have used the total graphlet
degree vector similarity (GDV-similarity) over all aligned nodes
(Milenković and Pržulj, 2008; Milenković et al., 2010) as a measure 2.3 User-friendly GUI plus source code
of node conservation (but other measures, such as protein sequence MAGNAþþ has a friendly and intuitive GUI for Windows, Mac OS
similarity, can also be used), and we have used our recent superior X and Unix, plus a command line interface (see its website for a tu-
S3 (Saraph and Milenković, 2014) as a measure of edge conserva- torial). The only required inputs are the two networks to be aligned
tion. We have aligned (i) a high-confidence yeast (Y) PPI network and the output file name. To maximize a node conservation measure
with its five lower-confidence (LC) counterparts, (ii) the same yeast in addition to the default S3 edge conservation measure, the user can
(Y) network with its five randomly rewired (RW) counterparts, and specify the a parameter value and the file with pairwise node simi-
(iii) three pairs of real-world PPI networks of different species; for larities (with respect to the desired node conservation measure). The
(i) and (ii), we know the true node mapping and so we can calculate GUI also allows for advanced options dealing with parallelization of

Y− Y− Y− Y− Y− Y− Y− Y− Y− Y− C. jejuni − Meso. − Yeast −


Y5%LC Y10%LC Y15%LC Y20%LC Y25%LC Y5%RW Y10%RW Y15%RW Y20%RW Y25%RW E. coli Syne. Human
Alignment quality score

2.8
2.4 S3
2 NM
1.6 NC
1.2
0.8
0.4
0
0 0.4 1 0 0.6 1 0 0.6 1 0 0.6 1 0 0.6 1 0 0.4 1 0 0.6 1 0 0.6 1 0 0.8 1 0 0.8 1 0 0.6 1 0 0.4 1 0 0.4 1
α α α α α α α α α α α α α

Fig. 1. Alignment quality in terms of NC, GDV-similarity node conservation measure (NM) and S3 edge conservation measure (S3 ), when optimizing with
MAGNAþþ node conservation only (left; a ¼ 0), edge conservation only (right; a ¼ 1; the original MAGNA), or a combination of node and edge conservation (mid-
dle; a in the (0, 1) range). We show results for the same 13 synthetic and real-world network pairs (shown at the top of the figure) as in the original MAGNA publi-
cation (Saraph and Milenković, 2014). Recall that we can compute NC for the synthetic but not real-world networks
Accuracy in global network alignment 2411

3 Conclusion
MAGNA is an already proven network aligner. MAGNAþþ is its
novel extension that allows for higher alignment quality, lower com-
putational complexity, intuitive use by domain scientists and easy
functional extensibility by computational scientists.

Funding
This work was supported by the National Science Foundation
[CCF-1319469].

Conflict of Interest: none declared.

Downloaded from https://academic.oup.com/bioinformatics/article/31/14/2409/256266 by guest on 12 October 2021


References
Clark,C. and Kalita,J. (2014) A comparison of algorithms for the pairwise
alignment of biological networks. Bioinformatics, 30, 2351–2359.
Crawford,J. and Milenković,T. (2014). GREAT: GRaphlet Edge-based net-
work AlignmenT. arXiv:1410.5103 [q-bio.MN].
Fig. 2. The speedup of MAGNAþþ. (a) The parallelization speedup is shown
Crawford,J. et al. (2014) Fair evaluation of global network aligners.
as a function of the number of threads (cores), when MAGNAþþ is run on a
arXiv:1407.4824 [q-bio.MN].
64-core machine, for the three real-world network pairs and the five synthetic
Faisal,F. et al. (2014) Global network alignment in the context of aging.
yeast (Y) low-confidence (LC) network pairs from Figure 1. We compute
Computational Biology and Bioinformatics, IEEE/ACM Trans. Comput.
speedup as the ratio of the time to run the process with one thread (the ori-
Biol. Bioinform, 12, 40–52.
ginal MAGNA’s time) to the time to run the process with k threads. (b) Given
two networks and their alignment (i.e. node mapping indicated by broken Kuchaiev,O. and Pržulj,N. (2011) Integrative network alignment reveals large
arrows) shown at the top, their composite graph is shown at the bottom. regions of global network similarity in yeast and human. Bioinformatics,
Using this composite graph idea allows for achieving additional speedup via 27, 1390–1396.
faster calculation of edge conservation. In this graph, double edges (indicated Kuchaiev,O. et al. (2010) Topological network alignment uncovers biological
with presence of both a blue edge and a red edge between the same aligned function and phylogeny. J. R. Soc. Interface, 7, 1341–1354.
node pairs) are conserved under the given alignment. These edges, along Liao,C. et al. (2009) IsoRankN: spectral methods for global alignment of mul-
with their participating nodes, form the common conserved subgraph be- tiple protein networks. Bioinformatics, 25, i253–i258.
tween the networks. One of outputs of MAGNAþþ is this common conserved Milenković,T. and Pržulj,N. (2008) Uncovering biological network function
subgraph, which allows one to visualize the given alignment in an intuitive via graphlet degree signatures. Cancer Inform., 6, 257–273.
manner. (c) Comparison of new MAGNAþþ’s and original MAGNA’s S3 im- Milenković,T. et al. (2010) Optimal network alignment with graphlet degree
plementations is shown in terms of running time (in seconds), with respect to vectors. Cancer Inform., 9, 121–137.
the same evaluation test, when both MAGNAþþ and MAGNA are fairly run Neyshabur,B. et al. (2013) NETAL: a new graph-based method for global
as single threads, for the same networks as in panel (a). Clearly, for each pair alignment of protein-protein interaction networks. Bioinformatics, 29,
of networks, the new S3 implementation in MAGNAþþ is always faster than
1654–1662.
the original S3 implementation in MAGNA
Patro,R. and Kingsford,C. (2012) Global network alignment using multiscale
spectral signatures. Bioinformatics, 28, 3105–3114.
the genetic algorithm. The output consists of the alignment (the list Saraph,V. and Milenković,T. (2014) MAGNA: maximizing accuracy in global
of aligned nodes from the input networks), the statistics regarding network alignment. Bioinformatics, 30, 2931–2940.
the alignment quality and the common conserved subgraph Sharan,R. and Ideker,T. (2006) Modeling cellular machinery through biolo-
gical network comparison. Nat. Biotechnol., 24, 427–433.
(Fig. 2b), which can then be analyzed in any network visualization
Singh,R. et al. (2007) Pairwise global alignment of protein interaction net-
tool for an intuitive interpretation of the alignment. For details, see
works by matching neighborhood topology. In: Speed,T. and Huang,H.
the MAGNAþþ tutorial. MAGNAþþ also provides the source (eds), Research in Computational Molecular Biology. Springer, Berlin,
code. Thus, computational scientists can easily extend it to, for ex- Germany, pp. 16–31.
ample, modify MAGNA’s alignment crossover function (which is at Sun,Y. et al. (2014) Simultaneous optimization of both node and edge
the heart of its genetic algorithm) or add additional node or edge conservation in network alignment via WAVE. arXiv:1410.3301
conservation measures. [q-bio.MN].

You might also like