Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

B RIEFINGS IN BIOINF ORMATICS . VOL 15. NO 2. 195^211 doi:10.

1093/bib/bbt034
Advance Access published on 21 May 2013

Supervised, semi-supervised and


unsupervised inference of gene
regulatory networks
Stefan R. Maetschke, Piyush B. Madhamshettiwar, Melissa J. Davis and Mark A. Ragan
Submitted: 19th January 2013; Received (in revised form) : 15th April 2013

Abstract

Downloaded from https://academic.oup.com/bib/article/15/2/195/212108 by guest on 05 August 2023


Inference of gene regulatory network from expression data is a challenging task. Many methods have been
developed to this purpose but a comprehensive evaluation that covers unsupervised, semi-supervised and
supervised methods, and provides guidelines for their practical application, is lacking.
We performed an extensive evaluation of inference methods on simulated and experimental expression data. The
results reveal low prediction accuracies for unsupervised techniques with the notable exception of the Z-SCORE
method on knockout data. In all other cases, the supervised approach achieved the highest accuracies and even in
a semi-supervised setting with small numbers of only positive samples, outperformed the unsupervised techniques.
Keywords: gene regulatory networks; simulation; gene expression data; machine learning

INTRODUCTION approaches but also, if possible, to explore these data


Mapping the topology of gene regulatory networks in new more-integrative ways.
is a central problem in systems biology. The regula- Many computational methods have been devel-
tory architecture controlling gene expression also oped to infer regulatory networks from gene
controls consequent cellular behavior such as devel- expression data, predominately using unsupervised
opment, differentiation, homeostasis and response to techniques. Several comparisons have been made
stimuli, while deregulation of these networks has of network inference methods, but a comprehensive
been implicated in oncogenesis and tumor progres- evaluation that covers unsupervised, semi-supervised
sion [1]. Experimental methods based, for example, and supervised methods is lacking, and many ques-
on chromatin immunoprecepitation, DNaseI hyper- tions remain open. Here we address fundamental
sensitivity or protein-binding assays are capable of questions, including which methods are suitable for
determining the nature of gene regulation in a what kinds of experimental data types, and how
given system, but are time-consuming, expensive many samples these methods require. In the follow-
and require antibodies for each transcription factor ing, we firstly review large-scale comparisons (more
[2]. Accurate computational methods to infer gene than five methods), before discussing evaluations
regulatory networks, particularly methods that focused on supervised and semi-supervised methods.
leverage genome-scale experimental data, are Finally we discuss the remaining smaller comparisons
urgently required not only to supplement empirical with an application-specific focus.

Corresponding author. Mark Ragan. The University of Queensland, Institute for Molecular Bioscience and ARC Centre of
Excellence in Bioinformatics, Brisbane, QLD 4072, Australia, Tel.: 61 7 3346 2616; Fax: 61 7 3346 2101; E-mail: m.ragan@uq.edu.au
Stefan Maetschke is a computer scientist, and his current research focus is on machine learning techniques and their application to the
inference of gene regulatory and protein–protein interaction networks.
Piyush Madhamshettiwar is currently completing a PhD in bioinformatics. His work is focused on the inference and analysis of gene
regulatory networks, and the development of integrative analytical workflows for cancer transcriptomics.
Melissa Davis is a computational cell biologist with research interests in genome-scale biological network and pathway analysis, and
the integration of experimental data with knowledge-based modeling.
Mark Ragan has research interests in comparative mammalian and bacterial genomics, genome-scale bioinformatics and computa-
tional systems biology.

ß The Author 2013. Published by Oxford University Press.


This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
196 Maetschke et al.

The most recent and largest comparison so far has unsupervised methods, distinguishes between experi-
been performed by Madhamshettiwar et al. [3]. They mental types and performs repeats, resulting in a
compared the prediction accuracy of eight unsuper- more complete picture. A related evaluation by
vised and one supervised method on 38 simulated Schaffter et al. [12] compared six unsupervised meth-
data sets. The methods showed large differences in ods on larger networks with 100, 200 and 500 nodes
prediction accuracy but the supervised method was and simulated expression data. Again the z-score
found to perform best, despite the parameters of the method was found to be one of the top performers
unsupervised methods having been optimized. Here in knockout experiments.
we extend this study to 17 unsupervised methods Several smaller evaluations have been performed
and include a direct comparison with supervised but are largely restricted to four unsupervised meth-
and semi-supervised methods on a wide range of ods (ARACNE, CLR, MRNET and RN) in
networks and experimental data types (knockout, comparisons with a novel approach on small data
knockdown and multifactorial). sets. The ARACNE method was introduced by

Downloaded from https://academic.oup.com/bib/article/15/2/195/212108 by guest on 05 August 2023


Another comprehensive evaluation, limited to un- Margolin et al. [13] and showed superior precision
supervised methods, has been performed as part of and recall when compared with RN and a BN
the Dialogue for Reverse Engineering Assessments algorithm on simulated networks. Meyer et al. [14]
and Methods (DREAM), an annual open competi- compared all four unsupervised inference algorithms
tion in network inference [4–8]. Results from on large yeast subnetworks (100 up to 1000 nodes)
DREAM highlight that network inference is a chal- using simulated expression data, and Altay and
lenging problem. To quote Prill et al. [7], ‘The vast Emmert-Streib [15] investigated the bias in the pre-
majority of the teams’ predictions were statistically dictions of those algorithms. Faith etal. [11] evaluated
equivalent to random guesses’. However, an import- CLR, ARACNE, RN and a linear regression model
ant result of the DREAM competition is that under on E. coli interaction data from RegulonDB and
certain conditions, simple methods can perform well: found CLR to outperform the other methods.
‘ . . . the z-score prediction would have placed Lopes et al. [16] studied the prediction accuracy of
second, first, and first (tie) in the 10-node, 50- ARACNE, MRNET, CLR and SFFS þ MCE, a
node, and 100-node subchallenges, respectively’ [7]. feature selection algorithm, on simulated networks
Unsupervised methods rely on expression data and found the latter superior for networks
only but tend to achieve lower prediction accuracies with small node degree. Haynes and Brent [17] de-
than supervised methods [3, 9, 10]. By contrast, veloped a synthetic regulatory network generator
supervised methods require information about (GRENDEL) and measured the prediction accuracy
known interactions for training, and this information of ARACNE, CLR, DBmcmc and Symmetric-N
is typically sparse. Semi-supervised methods reflect a for various network sizes and different experimental
compromise and can be trained with much fewer types. Werhli et al. [18] compared RN, graphical
interaction data, but usually are not as accurate pre- Gaussian models (GGMs) and BNs on the Raf
dictors as supervised methods. One of the few com- pathway, a small cellular signaling network with
parisons with supervised methods was performed by 11 proteins, and on simulated data. BNs and
Mordelet and Vert [9]. They evaluated supervised GGMs were found to outperform RN on observa-
inference of regulatory networks (SIRENE) in com- tional data. Camacho et al. [19] compared regulatory
parison to context likelihood of relatedness (CLR), strengths analysis, reverse engineering by multiple
algorithm for the reconstruction of accurate cellular regression, partial correlations (PC) and dynamic
networks (ARACNE), relevance networks (RN) BNs on a small simulated network with 10 genes,
and a Bayesian network (BN) on an Escherichia coli with different levels of noise. In the noise-free scen-
benchmark data set by Faith et al. [11] and found ario, the PC method showed the highest accuracy.
that the supervised method considerably outper- Finally, Cantone et al. [20] constructed a small, syn-
formed the unsupervised techniques. thetic, in vivo network of five genes and measured
Cerulo et al. [10] compared supervised and semi- time series and steady-state expression. In an evalu-
supervised support vector machines (SVMs) with ation of BANJO, ARACNE and two models based
two unsupervised methods and found the former on ordinary differential equations, they found the
superior. Our evaluation uses similar supervised and latter two to achieve the highest accuracies. Bansal
semi-supervised methods but includes many more et al. [21] also evaluated BANJO, ARACNE and
Inference of gene regulatory networks 197

ordinary differential equations but on random of the true network. In the Supplementary Material,
networks and simulated expression data. we nonetheless report results based on F1 score and
In the following sections, we first describe the Matthews correlation.
different inference methods in detail, before evaluat- To avoid discrepancies between the gene expres-
ing their prediction accuracies on simulated and ex- sion values generated by true regulatory networks
perimental gene expression data and regulatory and the actually known, partial networks, we per-
networks of varying size. We continue with a dis- formed evaluations on simulated, steady-state expres-
cussion of the prediction results and conclude with sion data, generated from subnetworks extracted
guidelines for the use of the evaluated methods. from E. coli and Saccharomyces cerevisiae networks.
This allowed us to assess the accuracy of an algorithm
against a perfectly known true network [21].
METHODS We used GeneNetWeaver [12, 24] and SynTReN [25]
We compared the prediction accuracy of unsuper- to extract subnetworks and to simulate gene expres-

Downloaded from https://academic.oup.com/bib/article/15/2/195/212108 by guest on 05 August 2023


vised, semi-supervised and supervised network infer- sion data.
ence methods. Unsupervised methods do not use any GeneNetWeaver has been part of several evaluations,
data to adjust internal parameters. Supervised meth- most prominently the DREAM challenges. The
ods, on the other hand, exploit all given data to simulator extracts subnetworks from known inter-
optimize parameters such as weights or thresholds. action networks such as those of E.coli and S.cerevisiae,
Semi-supervised methods use only part of the data emulates transcription and translation, and uses a set of
for parameter optimization, for instance, a subset of ordinary differential equations describing chemical
known network interactions. Note that unsupervised kinetics to generate expression data for knockout,
methods would be rendered (at least) semi-super- knockdown and multifactorial experiments.
vised by optimizing their parameters on network To simulate knockout experiments, the expression
data. value of each gene is in turn set to zero, whereas
The inference methods we evaluate here aim to for knockdown experiments, the expression value
recreate the topology of a genetic regulatory net- is halved. In multifactorial experiments, the expres-
work—that is, a network of gene-to-gene physical sion levels of a small number of genes are perturbed
regulatory interactions, some of which, however, by a small random amount. In contrast to the
might be hidden by shortcuts [8]—based on expres- DREAM challenge, we do not provide to the infer-
sion data only. In this context, the accuracy of a ence algorithms metadata such as which gene
method is assessed by the extent to which the net- has been knocked out or knocked down. All
work it infers is similar to the true regulatory net- unsupervised methods see only expression data,
work. Because many methods are not designed to while supervised methods see expression data plus
infer self-interactions or interaction direction we dis- interaction data.
regard directed edges and self-interactions. Fol- SynTReN is a similar but older simulator.
lowing other [9, 17, 22,] we quantify similarity by Subgraphs are also extracted from E. coli and S. cerevi-
the Area under the Receiver Operator Characteristic siae networks but it simulates only the transcription
curve (AUC) level and multifactorial experiments. However,
1X n SynTReN is faster than GeneNetWeaver and allows
AUC ¼ ðXk  Xk1 ÞðYk þ Yk1 Þ ð1Þ one to vary the sample number independently of
2 k¼1
the network size.
where Xk is the false-positive rate and Yk is the true- To enable a comprehensive and fair comparison,
positive rate for the k-th output in the ranked list we evaluated the prediction accuracies of these infer-
of predicted edge weights. An AUC of 1.0 indicates ence methods on subnetworks with different num-
perfect prediction, while an AUC of 0.5 indicates a bers of nodes (10, . . . ,110) extracted from E. coli and
performance no better than random predictions. S. cerevisiae, and used three experimental data types
Note that in contrast to other measures such as F1 [(knockout, knockdown, multifactorial) with vary-
score, Matthews correlation, recall or precision [23], ing sample set sizes (10, . . . ,110)] simulated by
AUC does not require choice of a threshold to infer GeneNetWeaver and SynTReN.
interactions from predicted weights; rather, it com- We performed no parameter optimization for
pares the predicted weights directly to the topology unsupervised methods because this would require
198 Maetschke et al.

training data (known interactions) and render those being the most common. Spearman’s method is
methods supervised. For the supervised and semi- simply Pearson’s correlation coefficient for the
supervised methods, 5-fold cross-validation was ranked expression values, and Kendall’s t coefficient
applied and parameters were optimized on the train- is computed as
ing data only.
conðXir , Xjr Þ  disðXir , Xjr Þ
In addition to simulated data, we also evaluated all tðXi , Xj Þ ¼ 1 ð3Þ
methods on two experimental data sets originating 2 nðn  1Þ
from the fifth DREAM systems biology challenge where Xir and Xjr are the ranked expression profiles
[8]. Specifically, we downloaded an E. coli network of genes i and j. conð, Þ denotes the number of con-
with 296 regulators, 4297 genes and the corresponding cordant and disð, Þ the number of disconcordant
expression data with 487 samples, and an S. cerevisiae value pairs in Xir and Xjr , with both profiles being
network with 183 regulators, 5667 genes and expres- of length n.
sion data with 321 samples. Both data sets are described

Downloaded from https://academic.oup.com/bib/article/15/2/195/212108 by guest on 05 August 2023


Because our evaluation of prediction accuracy
in detail in the Supplementary Material of [8]. The does not distinguish between inhibiting and activat-
following sections describe the inference methods in ing interactions, the predicted interaction weights are
detail. computed as the absolute value of the correlation
coefficients
Unsupervised
This section describes the evaluated unsupervised wij ¼ jcorrðXi , Xj Þj: ð4Þ
methods. CLR, ARACNE, MRNET and
MRNET-B are part of the R package ‘minet’ and SPEARMAN-C
were called with their default parameters [26], with SPEARMAN-C is a modification of Spearman’s
the exception of ARACNE. With the default par- correlation coefficient where we attempted to favor
ameter eps ¼ 0:0, ARACNE performed poorly and hub nodes, which have many, strong interactions.
we used eps ¼ 0:2 instead. Similarly, gene network The correlation coefficient is corrected by multiply-
inference with ensemble of trees (GENIE) [27], ing it by the mean correlation of gene i with all other
MINE [28] and partial correlation and information genes k, and the absolute value is taken as the inter-
theory (PCIT) [29] were installed and evaluated with action weight
default parameters. All other methods were imple-
mented according to their respective publications. 1Xn
wij ¼ jcorrðXi , Xj Þ  corrðXi , Xk Þj ð5Þ
SPEARMAN-C, EUCLID and SIGMOID are n k
implementations of our own inference algorithms.
where corrð, Þ is Spearman’s correlation coefficient.
Correlation
Correlation-based network inference methods Weighted gene co-expression network analysis
assume that correlated expression levels between WGCNA [30] is a modification of correlation-based
two genes are indicative of a regulatory interaction. inference methods that amplifies high correlation co-
Correlation coefficients range from þ1 to 1 and a efficients by raising the absolute value to the power
positive correlation coefficient indicates an activating of b (‘softpower’).
interaction, while a negative coefficient indicates an wij ¼ jcorrðXi , Xj Þjb ð6Þ
inhibitory interaction. The common correlation
measure by Pearson is defined as with b  1. Because softpower is a nonlinear but
covðXi , Xj Þ monotonic transformation of the correlation coeffi-
corrðXi , Xj Þ ¼ ð2Þ cient, the prediction accuracy measured by AUC will
sðXi Þ  sðXj Þ
be no different from that of the underlying correl-
where Xi and Xj are the expression levels of genes i ation method itself. Consequently we show only
and j, covð, Þ denotes the covariance, and sðÞ is results for correlation methods but not for the
the standard deviation. Pearson’s correlation measure WGCNA modification, which would be identical.
assumes normally distributed values, an assumption
that does not necessarily hold for gene expression Relevance networks
data. Therefore rank-based measures are frequently RN by Butte and Kohane [31] measure the mutual
used, with the measures by Spearman and Kendall information (MI) between gene expression profiles
Inference of gene regulatory networks 199

to infer interactions. The MI I between discrete vari- filter indirect interactions, but uses partial correlation
ables Xi and Xj is defined as coefficients instead of MI as interaction weights. The
X X   partial
partial correlation coefficient corrij between two
pðxi , xj Þ
IðXi , Xj Þ ¼ pðxi , xj Þ log ð7Þ genes i and j within an interaction triangle ði, j, kÞ is
xi 2Xi xj 2Xj
pðxi Þpðxj Þ
defined as
where pðxi , xj Þ is the joint probability distribution corrðXi , Xj Þ  corrðXi , Xk ÞcorrðXj , Xk Þ
partial
of Xi and Xj, and pðxi Þ and pðxj Þ are the marginal corrij ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð11Þ
probabilities. Xi and Xj are required to be discrete ð1  corrðXi , Xk ÞÞ2 ð1  corrðXj , Xk ÞÞ2
variables. We used equal-width binning for discret-
where corrð, Þ is Person’s correlation coefficient.
ization and empirical entropy estimation as described
The partial correlation coefficient aims to eliminate
by Meyer et al. [26].
the effect of the third gene k on the correlation of
genes i and j.

Downloaded from https://academic.oup.com/bib/article/15/2/195/212108 by guest on 05 August 2023


Context likelihood of relatedness
CLR [11] extends the RN by taking the background
MRNET
distribution of the MI values IðXi , Xj Þ into account.
MRNET [14] uses the MI between expression pro-
The most probable interactions are those that deviate
files and a feature selection algorithm [minimum-re-
most from the background distribution, and for each
dundancy–maximum-relevance (MRMR)] to infer
gene i, a maximum z-score zi is calculated as
  interactions between genes. More precisely, the
IðXi , Xj Þ  mi method places each gene in the role of a target
zi ¼ max 0, ð8Þ
j si gene j with all other genes V as its regulators. The
where mi and si are the mean value and standard MI between the target gene and the regulators is
deviation, respectively, of the MI values IðXi , Xk ), calculated and the MRMR method is applied to
k ¼ 1, . . . , n. The interaction wij between two select the best subset of regulators. MRMR step-
genes i and j is then defined as by-step builds a set S by selecting the genes iMRMR
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi with the largest MI value and the smallest redun-
wij ¼ z2i þ z2j ð9Þ dancy based on the following definition:

The background correction step aims to reduce the iMRMR ¼ argmaxðsi Þ ð12Þ
i2VnS
prediction of false interactions based on spurious cor-
relations and indirect interactions. with si ¼ ui  ri . The relevance term ui ¼ IðXi , Xj Þ
is thereby the MI between gene i and target j, and
Algorithm for the reconstruction of accurate the redundancy term ri is defined as
cellular networks 1 X
ARACNE [13] is another modification of the RN ri ¼ IðXi , Xk Þ ð13Þ
jSj k2S
that applies the Data Processing Inequality (DPI) to
filter out indirect interactions. The DPI states that, if Interaction weights wij are finally computed as
gene i interacts with gene j via gene k, then the wij ¼ maxðsi , sj Þ.
following inequality holds:
IðXi , Xj Þ  minðIðXi , Xj Þ, IðXj , Xk ÞÞ ð10Þ MRNET-B
MRNET-B is a modification of MRNET that
ARACNE considers all possible triplets of genes replaces the forward selection strategy to identify
(interaction triangles) and computes the MI values the best subset of regulator genes by a backward
for each gene pair within the triplet. Interactions selection strategy followed by a sequential replace-
within an interaction triangle are assumed to be in- ment [32].
direct and are therefore pruned if they violate the
DPI beyond a specified tolerance threshold eps. We Gene network inference with ensemble of trees
used a threshold of eps ¼ 0:2 for our evaluations. GENIE is similar to MRNET in that it also lets each
gene take on the role of a target regulated by the
Partial correlation and information theory remaining genes and then uses a feature selection
PCIT [29] is similar to ARACNE. PCIT extracts all procedure to identify the best subset of regulator
possible interaction triangles and applies the DPI to genes. In contrast to MRNET, Random Forests
200 Maetschke et al.

and Extra-Trees are used for regression and feature Mutual rank
selection [27] rather than MI and MRMR. MR by Obayashi and Kinoshita [34] uses ranked
Pearson’s correlation as a measure to describe gene
SIGMOID co-expression. For a gene i, first Pearson’s correlation
SIGMOID models the regulation of a gene by a with all other genes k is computed and ranked. Then
linear combination with soft thresholding. The pre- the rank achieved for gene j is taken as score to de-
dicted expression value Xik0 of gene i at time point k scribe the similarity of the gene expression profiles Xi
is described by the sum over the weighted expression and Xj:
values Xjk of the remaining genes, constrained by a rankij ¼ rankðcorrðXi , Xk Þ, 8k 6¼ iÞ ð20Þ
sigmoid function sðÞ. j

X
n
Xik0 ¼ sð Xjk wij þ bi Þ ð14Þ with corrð, Þ being Pearson’s correlation coefficient.
j6¼i The final interaction weight wij is calculated as the

Downloaded from https://academic.oup.com/bib/article/15/2/195/212108 by guest on 05 August 2023


geometric average of the ranked correlation between
1
sðxÞ ¼ ð15Þ gene i and j and vice versa:
1 þ ex
rankij  rankji
The regulatory weights wij are determined by mini- wij ¼ ð21Þ
2
mizing the following quadratic error function over
the predicted expression values Xik0 and the observed
values Xik: Maximal information nonparametric exploration
1XX 0 MINE is a class statistics by Reshef [28]. The max-
Eðw, bÞ ¼ ðXik  Xik Þ2 ð16Þ
2 i k imal information coefficient (MIC) is part of this class
and a novel measure to quantify nonlinear relation-
Finally, the interaction weights w0ij for the undirected ships. We computed the MIC for expression profiles
network are computed by averaging over the for- Xi and Xj and interpreted the MIC score as an inter-
ward and backward weights: action weight
jwij j þ jwji j
w0ij ¼ ð17Þ wij ¼ MICðXi , Xj Þ ð22Þ
2

Mass^distance EUCLID
MD by Yona et al. [33] is a similarity measure for EUCLID is a simple method that uses the
expression profiles. It estimates the probability to ob- euclidean distance between the normalized expres-
serve a profile inside the volume delimited by the sion profiles Xi0 and of two genes as interaction
profiles. The smaller the volume, the more similar weights
are the two profiles. Given two expression profiles Xi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X
and Xj, the total probability mass of samples whose wij ¼ ðXik0  Xjk0 Þ2 ð23Þ
k-th feature is bounded between the expression k
values Xik and Xjk is calculated as
X where profiles are normalized by computing the
MASSk ðXi , Xj Þ ¼ freqðxÞ ð18Þ absolute difference of expression values Xik to the
minðXik , Xjk ÞxmaxðXik , Xjk Þ
median expression in profile Xi
where freq(x) is the empirical frequency. The mass Xik0 ¼ jXik  medianðXi Þj ð24Þ
distance MDij is defined as the total volume of pro-
files bounded between the two expression profiles Xi
and Xj and is estimated by the product over all Z-SCORE
coordinates k. Z-SCORE is a network inference strategy by Prill
Y
n et al. [7] that takes advantage of knockout data.
MDij ¼ MASSk ðXi , Xj Þ ð19Þ It assumes that a knockout affects directly interacting
k
genes more strongly than others. The z-score zij
where n is the length of the expression profiles. describes the effect of a knockout of gene i in
Because the MDij is symmetric and positive, we the k-th experiment on gene j as the normalized
interpret it directly as an interaction weight wij. deviation of the expression level Xjk of gene j for
Inference of gene regulatory networks 201

experiment k from the average expression mðXj Þ of confident the prediction, and similar to a correlation
gene j: value we interpret the distance as an interaction
Xjk  mðXj Þ weight.
zij ¼ j j ð25Þ
sðXj Þ In contrast to unsupervised methods, e.g. correl-
ation methods, the supervised approach does not
The original Z-SCORE method requires knowledge directly operate on pairs of expression profiles but
of the knockout experiment k and is therefore not on feature vectors that can be constructed in various
directly applicable to data from multifactorial experi- ways. We computed the outer product of two gene
ments. The method, however, can easily be general- expression profiles Xi and Xj to construct feature
ized by assuming that the minimum expression value
vectors:
within a profile indicates the knockout experiment
[minðXj Þ ¼ Xjk ]. Equation (25) then becomes x ¼ Xi XjT ð30Þ
minðXj Þ  mðXj Þ

Downloaded from https://academic.oup.com/bib/article/15/2/195/212108 by guest on 05 August 2023


wij ¼ j j ð26Þ The outer product was chosen because it is commu-
sðXj Þ tative, and predicted interactions are therefore sym-
and the method can be applied to knockout, knock- metric and undirected. A sample set for the training
down and multifactorial data. Note that zij is an of the SVM is then composed of feature vectors xi
asymmetric score and we therefore take the max- that are labeled yi ¼ þ1 for gene pairs that interact
imum of zij and zji to compute the final interaction and yi ¼ 1 for those that do not interact.
weight wij as If all gene pairs are labeled, all network inter-
wij ¼ maxðzij , zji Þ ð27Þ actions would be known and prediction would be
unnecessary. In practice and for evaluation purposes,
training is therefore performed on a set of labeled
Supervised samples, and predictions are generated for the sam-
A great variety of different supervised machine learn-
ples of a test set. Figure 1 depicts the concept. All
ing methods has been developed. We limit our evalu-
samples within the training set are labeled and all
ation to SVMs because they have been successfully
remaining gene pairs serve as test samples.
applied to the inference of gene regulatory networks
Note that the term ‘sample’ in the context of
[9] and can easily be trained in a semi-supervised
supervised learning refers to a feature vector derived
setting [10]. We used the SVM implementation
from a pair of genes and their expression profiles,
SVMLight by Joachims [35] for all evaluations.
SVMs are trained by maximizing a constrained, whereas a sample in an expression data set refers to
quadratic optimization problem over Lagrange the gene expression values for a single experiment,
multipliers a: e.g. a gene knockout.
X We evaluate the prediction accuracy of the super-
N
1X N
max LðaÞ ¼ ai  ai aj yi yj xT
i xj vised method by generating labeled feature vectors
a 2 i, j¼1
8 N
i¼1 for all gene pairs (samples) of a network. This entire
ð28Þ
< Pay ¼ 0 sample set is then divided in to five parts. Each of the
i i
subject to i¼1
:
0  ai  C for 8i:

The labels yi determine the class to which feature


vector xi belongs and C is the so-called complexity
parameter that needs to be tuned for optimal predic-
tion performance. Once the optimal Lagrange multi-
pliers a are found, a feature vector can be classified
by its signed distance dðxÞ to the decision boundary,
which is computed as
X
N
dðxÞ ¼ ai yi xT
i xþb ð29Þ
i¼1

The distance dðxÞ can be interpreted as a confidence Figure 1: Extraction of samples for the training and
value. The larger the absolute distance, the more test set from a gene interaction network.
202 Maetschke et al.

parts is used as a test set and the remaining four parts only a portion of the training samples is labeled.
serve as a training set. The total prediction accuracy is To enable the SVM training, which requires all sam-
averaged over the prediction accuracies achieved ples to be labeled, all unlabeled samples within the
during the five iterations (5-fold cross-validation). semi-supervised training data are relabeled as nega-
tives [10]. This approach enables a direct comparison
Semi-supervised of the same prediction algorithm trained with fully
Data describing regulatory networks are sparse and or partially labeled data.
typically only a small fraction of the true interactions We assigned different percentages (10%, . . . ,100%)
is known. The situation is even worse for negative of true positive and negative or positive-only labels
data (non-interactions) because experimental valid- to the training set. The prediction performance of the
ation largely aims to detect but not exclude inter- different approaches was then evaluated by 5-fold
actions. The case that all samples within a training cross-validation, with equal training/test set sizes for
data set can be labeled as positive or negative is there- the supervised, semi-supervised, positives-only and

Downloaded from https://academic.oup.com/bib/article/15/2/195/212108 by guest on 05 August 2023


fore rarely given for practical network inference unsupervised methods compared.
problems, and supervised methods are limited to
small training data sets, which negatively affects
their performance.
RESULTS
In the following, we first evaluate the prediction
Semi-supervised methods strive to take advantage
accuracy of unsupervised methods before comparing
of the unlabeled samples within a training set by
two selected unsupervised methods with supervised
taking the distribution of unlabeled samples into
and semi-supervised approaches on simulated data.
account, and can even be trained on positively
The last section compares unsupervised and super-
labeled data only. Figure 2 shows the required label-
vised methods on experimental data.
ing of data for the different approaches. Supervised
methods require all samples within the training set to
be labeled, while unsupervised methods require no Unsupervised methods
labeling at all. Semi-supervised approaches can be Figure 3 shows the prediction accuracies measured
distinguished into methods that need positive and by AUC for all unsupervised methods for three dif-
negative samples and methods that operate on posi- ferent experimental types (knockout, knockdown
tive samples only. and multifactorial) and the average AUC (all) over
The semi-supervised method used in this evalu- the three types. Networks with 10, 30, 50, 70,
ation is based on the supervised SVM approach 90 and 110 nodes were extracted from E. coli and
described above. The only difference is in the label- S. cerevisiae and expression data were simulated with
ing of the training set. In the semi-supervised setting, GeneNetWeaver, with the number of samples
(experiments) equal to the nodes of the network.
Every evaluation was repeated 10 times, so each
bar therefore represents an AUC averaged over 60
networks or 180 networks (all).
Most obvious are the large standard deviations
in prediction accuracy across all methods and experi-
mental types. For small networks, the accuracy of a
method can easily vary between no better than gues-
sing to close to perfect (see Supplementary Material).
While most differences between methods are statis-
tically significant (P-values <0.01 for Wilcoxon rank
sum test with Bonferroni correction), differences are
largely small and the ranking for most methods is
therefore not stable and depends on the experimental
Figure 2: Original labeling of samples for supervised, data type, the source network, the subnetwork size
unsupervised, semi-supervised and positives-only pre- and other factors (see Supplementary Material).
diction methods. All the six samples within a sample However, a simple Pearson’s correlation is consist-
set are generated by a four-node network with three ently the second-best performer for all experimental
interactions. types.
Inference of gene regulatory networks 203

Downloaded from https://academic.oup.com/bib/article/15/2/195/212108 by guest on 05 August 2023


Figure 3: Prediction accuracy (AUC) of unsupervised methods on multifactorial, knockout, knockdown and aver-
aged (all) data generated by GeneNetWeaver. Ten repeats over networks with 10,...,110 nodes, extracted from
E. coli and S. cerevisiae. Error bars show standard deviation.

Interestingly, rank-based correlation methods Network size


(SPEARMAN, KENDALL) that are similar to Figure 3 summarizes results averaged over
Pearson correlation perform poorly on knockout networks. We also examined how the network
and knockdown data but well for multifactorial size impacts the prediction performance of the vari-
experiments. Obviously, a seemingly minor change ous methods. The heat map in Figure 4 is based
from a linear correlation (Pearson) to a rank-based on the same data as Figure 3, but shows the predic-
correlation (Spearman) has a dramatic impact on the tion accuracies (AUC) of the inference methods
prediction performance in the case of knockout (and on multifactorial data for networks with different
knockdown) data. numbers of nodes (see Supplementary Material
With the exception of the Z-SCORE method for the related figures on knockout and knockdown
prediction, accuracies are low in general. Z- data).
SCORE was specifically designed for knockout The rows in Figure 4 are ordered according to
data and indeed clearly outperforms all other meth- mean AUC and the ranking is therefore identical
ods for this experimental type, despite its simplicity. to that in the multifactorial bar graph in Figure 3.
It is the only unsupervised method that achieves a Top performers on average are the correlation meth-
good prediction accuracy (AUC  0.9). ods by Pearson, Spearman and Kendall, with the
204 Maetschke et al.

corrected Spearman method (SPEARMAN-C) different network sizes and sample numbers.
achieving the highest mean AUC. However, when SynTReN simulates expression data for multifactorial
focusing on networks of specific size, the best per- experiments only, and networks were extracted from
formance is achieved by the EUCLID method for E. coli. All experiments were repeated 10 times. The
small networks with 10 nodes. Other methods also results show the expected trend of improving accur-
show different behaviors with respect to network acy with increasing number of samples and decreasing
size. Correlation methods clearly achieve higher size of network.
AUCs for large networks. Similar trends can be However, the absolute improvements in predic-
observed for MR, MINE, GENIE, MRNET, tion accuracy are rather small with additional data,
MRNET-B and CLR. In contrast, SIGMOID, most likely because unsupervised methods can infer
PCIT and MD decrease in prediction accuracy for only simple network topologies reliably and small
growing network sizes, while the performance of sample sets are sufficient for this purpose. For in-
RN and ARACNE is seemingly unaffected by net- stance, networks with 50 nodes are predicted with

Downloaded from https://academic.oup.com/bib/article/15/2/195/212108 by guest on 05 August 2023


work size within the investigated size range. an AUC of roughly 0.65, when 50 samples are avail-
able. Increasing the sample set size to 110 raises the
Sample number prediction accuracy only to an AUC of around 0.67.
Apart from the size of the network, we also expected
the number of samples to have an effect on the Supervised methods
prediction accuracy of the inference algorithms. Finally, we wanted to compare unsupervised with
GeneNetWeaver generates gene expression profiles supervised and semi-supervised approaches. Because
with the same number of samples as network nodes of the time-consuming training required for super-
(genes). We therefore used SynTReN to vary net- vised methods, we limited our evaluation to net-
work size and sample number independently. The works with 30 nodes extracted from E. coli
heat map in Figure 5 shows prediction accuracy networks. Expression profiles were generated with
(AUC) averaged over all inference methods for GeneNet-Weaver, and each experiment was re-
peated 10 times.
Figure 6 shows the prediction accuracies (AUC)
for supervised and semi-supervised methods for three
different experimental types (knockout, knockdown
and multifactorial) and the average AUC (all) data.
For direct comparison, we included two

Figure 4: Prediction accuracy (AUC) of unsupervised Figure 5: Prediction accuracy (AUC) averaged over all
methods on multifactorial data for different network unsupervised methods on multifactorial for different
sizes (nodes). Data generated by GeneNetWeaver and network sizes (nodes) and sample numbers. Data gener-
extracted from E. coli and S. cerevisiae. ated by SynTReN and extracted from E. coli.Ten repeats.
Inference of gene regulatory networks 205

unsupervised methods (Z-SCORE, SPEARMAN) data. Apparently, supervised methods can be trained
in our evaluation of supervised methods. effectively even when only a portion of network
Supervised and semi-supervised methods are labeled interactions (positives) is known.
‘SVM’ followed by the percentage of labeled data Even with as little as 10% of known interactions,
(10, 30, 50, 70, 90, 100%). The suffix ‘þ’ indicates semi-supervised methods still outperform unsuper-
that only positive data were used and ‘’ indicates vised methods for multifactorial data. The
that positive and negative data were used. For in- Z-SCORE method is still the top-performing
stance, ‘SVM-70’ describes an SVM trained on method on knockout data, but supervised methods
70% of labeled data (positive and negative). All are not far behind and considerably outperform
evaluations are 5-fold cross-validated and the com- Spearman correlation. For knockdown data, the
plexity parameter C of the SVM was optimized via Z-SCORE method loses its top rank, and semi-
grid search (0:1 . . . 100) for each training fold. supervised methods perform better when at least
The results show good prediction accuracies for 70% of the data are labeled.

Downloaded from https://academic.oup.com/bib/article/15/2/195/212108 by guest on 05 August 2023


supervised methods on all experimental types, with To summarize, apart from the Z-SCORE method
a slight advantage for knockout data. As expected, on knockout data, supervised and semi-supervised
performance increases with the percentage of data approaches considerably outperform unsupervised
labeled but there is little difference between labeling methods and achieve good prediction accuracies in
only positive data, and both positive and negative general for networks of this size.

Figure 6: Prediction accuracy (AUC) of supervised methods on multifactorial, knockout, knockdown and aver-
aged (all) data generated by GeneNetWeaver. Results are for 5-fold cross-validation and 10 repeats over networks
with 30 nodes, extracted from E. coli. Error bars show standard deviation.
206 Maetschke et al.

Experimental data S. cerevisiae. For better comparison, only the first


All results in the previous sections are based on 30 samples from the expression data sets were used.
simulated data. In this section, we analyze the Results based on the complete expression data can be
performance of the inference methods on the experi- found in the Supplementary Material.
mental data described in Methods. For more-precise The results on the experimental data are in good
comparison, we applied the same methodology agreement with the simulation results (Figures 3
and used GeneNetWeaver to randomly extract and 6). The accuracy of the unsupervised methods
subnetworks with 30 nodes from the experimental remains low, while the supervised methods perform
networks. Note, however, that GeneNetWeaver dramatically better. Even with only 10% of known
was not used to simulate gene expression—here we interactions as a training source, the supervised meth-
use empirical data [8] instead—and that the E.coli and ods outperform the unsupervised methods by a wide
S. cerevisiae source networks are different from those margin. The ranking of the unsupervised methods is
on which the simulation was based. inconsistent between data sets, but this is of little

Downloaded from https://academic.oup.com/bib/article/15/2/195/212108 by guest on 05 August 2023


Figure 7 shows the prediction accuracy of the significance owing to their low accuracy and the
supervised and unsupervised methods for experimen- large variances. Using the full expression data set (in-
tal networks and gene expression data of E. coli and stead of only 30 samples) does not improve the

Figure 7: Prediction accuracy (AUC) of supervised and unsupervised methods for networks with 30 nodes
extracted from E. coli and S. cerevisiae. The first 30 samples of the corresponding experimental expression data are
used. AUCs are averaged over 10 repeats and error bars show standard deviation. Results for supervised methods
are 5-fold cross-validated.
Inference of gene regulatory networks 207

accuracy of the best-performing unsupervised meth- benchmarks, but methods that fail for simulated
ods (see Supplementary Material). data are unlikely to succeed in the inference of real
biological networks [21].

DISCUSSION Linear SVMs


Directed interactions and Another limitation of our study is the focus on linear
self-interactions SVMs for the evaluation of supervised and semi-
Large-scale evaluations, including DREAM, measure supervised methods. We preferred linear SVMs
prediction accuracy by comparing the topology of over more-powerful nonlinear methods for two rea-
the inferred network with a known, true network. sons. Firstly, linear SVMs are considerably faster to
While this true network can contain directed inter- train and have fewer parameters to optimize than
actions and loops, only a small subset of methods can nonlinear SVMs—a significant advantage in a com-

Downloaded from https://academic.oup.com/bib/article/15/2/195/212108 by guest on 05 August 2023


infer direction or loops natively (e.g. GENIE [27]). prehensive study. Secondly, identifying a complex
GeneNetWeaver extracts directed edges and self- system with many variables (interaction weights)
interactions if present, but to make our comparison from a small number of samples calls for a simple
direct and fair, we subsequently ignore them in com- predictor. We also tried to evaluate transductive
puting AUC. SVMs [37] but found them time-consuming to
train, and they achieved accuracies considerably
Unsupervised methods lower than the semi-supervised SVMs (data not
Unsupervised methods are attractive because they do shown). We therefore did not perform a full evalu-
not require knowledge about the network to be ation and do not report results for transductive
inferred, and can be applied directly without a SVMs.
time-consuming training process. These are import-
ant practical advantages in comparison with super- Feature vectors
vised methods because reliable network data are We construct feature vectors by computing the outer
often unavailable and training times for larger net- product of the expression profiles of two genes.
works can become prohibitive. Parameter optimiza- Cerulo et al. [10] constructed feature vectors by
tion would have removed these benefits and concatenating the two expression profiles. The
rendered unsupervised methods (semi-)supervised. outer product results in larger feature vectors
On the other hand, as Madhamshettiwar et al. [3] (N 2 versus 2N) but is independent of the order of
have shown, parameter optimization can improve the gene pair. The training set is therefore half the
prediction accuracy. size compared with the concatenation approach
[nðn  1Þ] and we achieved higher prediction accura-
Simulated data cies with the linear SVM. Cerulo etal. [10], however,
While simulators such as GeneNetWeaver generate used nonlinear SVMs (RBF) that might achieve the
expression data that are in good agreement with bio- same or better accuracies on concatenated feature
logical measurements [6], they remain incomplete vectors but are more time-consuming to train and
models, e.g. posttranscriptional regulation and require two parameters (C, g) to be optimized. It
chromatin states are missing, and an evaluation of therefore remains an open question, which method
inference methods on real data would clearly be is preferable.
preferable. However, currently known network SIRENE by Mordelet and Vert [9] takes a differ-
structures, even for well-characterized organisms, ent approach, with SVMs trained on feature vectors
are fragmentary and only partially correct represen- derived from single profiles. However, it requires
tations of the interactions between genes [4]. knowledge about the transcription factors amongst
Consequently, there is an unknown but probably the genes, and cannot predict interactions between
large discrepancy between the expression data mea- target genes. Because each transcription factor is
sured and the observed part of the actual network assigned a separate SVM, feature vectors are of
that generates them, rendering assessment of infer- length N and the training set has only n samples,
ence methods on observed gene regulatory networks the individual SVMs can be trained efficiently, but
and their expression values difficult. We therefore training time is multiplied by the number of tran-
have largely focused our evaluation on in silico scription factors.
208 Maetschke et al.

Unbalanced data sets


Gene regulatory networks tend to be sparse, with the
number of positive samples (interactions) typically
much smaller than the number of negative samples
(non-interactions). Consequently data sets for the
training of supervised methods are heavily unba-
lanced, and this could have a negative impact on
the prediction accuracy of the classifier. We therefore Figure 8: Gene A inhibits gene D, and gene B acti-
tried to weight positive and negative samples vates gene D. The resulting expression profile of gene
inversely to their ratio, but did not observe any im- D is, however, most similar to that of gene C, which
provements in prediction accuracy (data not shown). does not regulate gene D.
All evaluations in this article were therefore per-
formed with equally weighted (w ¼ 1) samples.

Downloaded from https://academic.oup.com/bib/article/15/2/195/212108 by guest on 05 August 2023


different from A or B. Consequently, the most
appropriate but erroneous conclusion is to infer a
Effect of sample number regulatory relationship between C and D. Without
We studied the effect of sample number on the pre- any further information (e.g. knockouts, existing
diction accuracy using an over-determined system interactions) any method that infers interactions
(more samples than genes), and found only a mar- from the similarity of expression profiles alone is
ginal improvement for larger sample numbers. For an prone to fail in this common case. Schaffter et al.
under-determined system (fewer samples than [12] identify other common network motifs and
genes), increasing the number of samples is likely the methods that tend to infer them incorrectly,
to be more beneficial. However, large networks and Krishnan et al. [38] show that networks of a cer-
cannot be inferred reliably (partly owing to a lack tain complexity cannot be reverse-engineered from
of data) and we therefore focused our evaluation on expression data alone.
small networks, for which sufficient data (number of
samples matching the number of genes) are often
available in practice. Experimental data
We evaluated all inference methods on simulated
Network inference and experimental data, and found the results to be
The evaluation results reveal large variations in pre- in good agreement. Specifically, the supervised
diction accuracies across all methods. Nonlinear methods consistently achieved much higher accura-
methods such as MINE do not perform better than cies than unsupervised methods on both simulated
linear Pearson’s correlation, and in general, we find and experimental data, and prediction accuracies for
that complex methods are no more accurate than simulated and experimental data were similar.
simple methods. The Z-SCORE method and Marbach etal. [8] report substantially lower accuracies
Pearson correlation are the two best-performing (of unsupervised methods) for S. cerevisiae and attri-
unsupervised methods. bute this to the increased regulatory complexity and
A detailed analysis revealed that unsupervised prevalence of posttranscriptional regulation in eu-
approaches work well for simple network topologies karyotes and/or to the lower coverage of the
(e.g. star topology) and networks with exclusively S. cerevisiae network.
activating or inhibiting interactions, but fail for In our evaluations, the accuracies for S. cerevisiae
more complex cases (see Supplementary Material). were only slightly lower than for E. coli, but note
Mixed regulatory interactions constitute a funda- that Marbach et al. [8] infer the complete network
mental problem for unsupervised network inference with >5000 nodes, while we randomly extract small
as depicted in Figure 8. subnetworks with 10–110 nodes and calculate an
Let gene A inhibit gene D but let gene B activate average accuracy. Any part of the S.cerevisiae network
the same gene D. Given the expression profiles of with low coverage would affect the average accuracy
genes A and B as shown in Figure 8, and assuming less than it would the overall accuracy. It may be un-
identical interaction weights but with opposite signs, necessary to invoke regulatory complexity, especially
the profile for gene D, resulting from a linear considering the high accuracies of the supervised
combination, is most similar to that of gene C and methods.
Inference of gene regulatory networks 209

Marbach et al. [8] furthermore suggest that incor- These examples illustrate how GRN inference can
porating additional information such as transcription- productively be combined with downstream analysis:
factor binding and chromatin modification data or a GRN is initially inferred from experimental data,
promoter sequences might improve the accuracy of features of interest are identified based on network
prediction. We have performed evaluations with topology and interesting features are then ranked by
added transcription-factor binding data but could applying additional criteria derived from experimen-
not improve prediction accuracies (data not shown). tal evidence, functional annotation or literature.
Both He et al. [39] and Della Gatta et al. [41] used
GRNs to identify highly connected nodes as features
Applications of inference methods of interest, and then either rank these hubs, or refine
The inference of regulatory interactions from expres- the networks, by use of additional evidence.
sion data has the potential to capture important Interesting hubs prioritized in this way then form
associations between key genes regulating transcrip- the basis of hypotheses that can be subjected to ex-

Downloaded from https://academic.oup.com/bib/article/15/2/195/212108 by guest on 05 August 2023


tion in various biological conditions. Concerns about perimental investigation. Such hybrid approaches le-
the accuracy of inference methods have led to the verage the power of GRN inference methods, while
development of approaches that include GRN infer- controlling for variability in prediction performance
ence as one part of a broader computational frame- by using additional criteria to select network features
work. For example, He et al. [39] used modified of interest.
correlation network inference to identify key genes
involved in the immune suppressor function of
human regulatory T cells. These authors combined
CONCLUSION
Perhaps the most important observation from this
two correlation-based approaches [36, 40] capable of
evaluation is the large variance in prediction accura-
identifying patterns of correlation across time-series
cies across all methods. In agreement with Haynes
microarray expression data. They then assessed the
and Brent [17], we find that a large number of
quality of their inferred networks by the extent to
repeats on networks of varying size is required for
which interacting proteins shared biological process reliable estimates of the prediction accuracy of a
annotations, and selected key functional hubs from method. Evaluations on single data sets—especially
the inferred network using a metric computed from on real data—are unsuitable to establish differences
the results of literature mining and expression differ- in the prediction accuracy of inference methods.
ences. They found that 6 of the top 10 hubs so On average, unsupervised methods achieve low
ranked were already known to be involved in the prediction accuracies, with the notable exception of
suppressor function of regulatory T cells, and went the Z-SCORE method, and are considerably outper-
on to experimentally characterize the novel role of formed by supervised and semi-supervised methods.
the hub gene plasminogen activator urokinase Simple correlation methods such as Pearson correl-
(PLAU) in this suppressor function. ation are as accurate as much more complex methods,
Similarly, Della Gatta et al. [41] applied the infer- yet much faster and parameterless. Unsupervised
ence method ARACNE to reverse-engineer a methods are appropriate for the inference only of
genome-scale GRN in leukemia. To define the simple networks that are entirely composed of inhibi-
oncogenic regulatory circuits controlled by homeo- tory or activating interactions but not both.
box transcription factors TLX1 and TLX3, these The Z-SCORE method achieved the best predic-
investigators then used ChIP-chip data for these tion accuracy of all methods on knockout data.
transcription factors along with expression data to However, experimental knockouts cannot be per-
identify direct targets of the TLX genes in the formed systematically, or at all, in many important
inferred network, requiring target genes to be differ- biological systems. The method also fails when a
entially expressed between tumors that express gene is regulated by an or-junction of two genes.
TLX1 and TLX3. They then selected the most On multifactorial data, the supervised and semi-
highly connected hub, RUNX1, as a putative supervised methods achieved the highest accuracies;
master regulator of the TLX1 and 3 transcriptional even with as few as 10% of known interactions, the
programs, and validated the predictions of their semi-supervised methods still outperformed all un-
network model with further ChIP-chip analysis of supervised approaches. There was little difference in
RUNX1. prediction accuracy for semi-supervised methods
210 Maetschke et al.

trained on positively labeled data only, compared  Unsupervised inference methods are accurate only for networks
with simple topologies.
with training on positive and negative samples.  An exception is the unsupervised Z-SCORE method that shows
Apparently semi-supervised methods can effectively the highest accuracy on simulated knockout data.
be trained on partial interaction data, and non-  Knockout data are more informative for network inference than
knockdown or multifactorial data.
interaction data are not essential.
These results have important implications for the
application of network inference methods in systems
biology. Even the best methods are accurate only for FUNDING
small networks of relatively simple topology, which Australian Research Council (DP110103384 and
means that large-scale or genome-scale regulatory CE0348221).
network inference from expression data alone is
currently not feasible. If inference methods are
to be applied to data of the scale generated by

Downloaded from https://academic.oup.com/bib/article/15/2/195/212108 by guest on 05 August 2023


modern microarray platforms, a feature selection References
step is usually required to reduce the size of the 1. Pe’er D, Hacohen N. Principles and strategies for
inference problem; attempts to apply network infer- developing network models in cancer. Cell 2011;144:
ence to such large-scale datasets may be premature, 864–73.
and consideration should be given to focusing the 2. Elnitski L, Jin V, Farnham P, et al. Locating mammalian
biological question to use smaller-scale higher-qual- transcription factor binding sites: a survey of computational
and experimental techniques. Genome Res 2006;16(12):
ity experimental data. 1455–64.
Our analysis also indicates that certain kinds of 3. Madhamshettiwar P, Maetschke S, Davis M, et al. Gene
biological data are more amenable for accurate net- regulatory network inference: evaluation and application
work inference than others. Most microarray datasets to ovarian cancer allows the prioritization of drug targets.
Genome Med 2012;4(5):41.
are most similar to our multifactorial data sets, which
4. Stolovitzky G, Monroe D, Califano A. Dialogue on
yielded poorly inferred networks with unsupervised reverse-engineering assessment and methods: the DREAM
methods. Increasing the number of samples in the of high-throughput pathway inference. Ann N Y Acad Sci
experiment (a common strategy to improve infer- 2007;1115:1–22.
ence) does not in fact generate the hoped-for im- 5. Stolovitzky G, Prill R, Califano A. Lessons from the
provements. More useful are knockout data, which DREAM2 challenges. Ann N Y Acad Sci 2009;1158:
159–95.
our simulations show contain more useful informa-
6. Marbach D, Prill R, Schaffter T, et al. Revealing strengths
tion, and support higher-quality inference. Biologists and weaknesses of methods for gene network inference.
who wish to gain insight into regulatory architecture Proc Natl Acad Sci USA 2010;107(14):6286–91.
should consider these limitations when designing 7. Prill RJ, Marbach D, Saez-Rodriguez J, et al. Towards a
experiments. rigorous assessment of systems biology models: The
DREAM3 challenges. PLoS One 2010;5(2):e9202.
To summarize, small networks (50 nodes) can be
8. Marbach D, Costello J, Ku« ffner R, et al. Wisdom of crowds
inferred with high accuracy (AUC  0.9) even with for robust gene network inference. Nat Methods 2012;9(8):
small numbers of samples using supervised techniques 796–804.
or the Z-SCORE method. However, even with the 9. Mordelet F, Vert JP. SIRENE: supervised inference of
best-performing methods, large variations in predic- regulatory networks. Bioinformatics 2008;24:i76–82.
tion accuracy remain, and predictions may be limited 10. Cerulo L, Elkan C, Ceccarelli M. Learning gene regulatory
to undirected networks without self-interactions. networks from only positive and unlabeled data. BMC
Bioinformatics 2010;11:228.
11. Faith JJ, Hayete B, Thaden JT, et al. Large-scale mapping
SUPPLEMENTARY DATA and validation of escherichia coli transcriptional regulation
Supplementary data are available online at http:// from a compendium of expression profiles. PLoS Biol
2007;5(1):e8.
bib.oxfordjournals.org/
12. Schaffter T, Marbach D, Floreano D. GeneNetWeaver:
in silico benchmark generation and performance profiling
of network inference methods. Bioinformatics 2011;27(16):
Key points 2263–70.
 Prediction accuracies strongly depend on the complexity of the 13. Margolin A, Nemenman I, Basso K, et al. ARACNE: An
network topology. algorithm for the reconstruction of gene regulatory
 Supervised methods generally achieve higher prediction accura- networks in a mammalian cellular context. BMC
cies than unsupervised methods. Bioinformatics 2006;7(Suppl 1):S7.
Inference of gene regulatory networks 211

14. Meyer P, Kontos K, Lafitte F, Bontempi G. Information- 28. Reshef DN. Detecting novel associations in large data sets.
theoretic inference of large transcriptional regulatory net- Science 2011;334:1518–24.
works. EURASIP J Bioinform Syst Biol 2007;2007:79879. 29. Reverter A, Chan EK. Combining partial correlation and an
15. Altay G, Emmert-Streib F. Revealing differences in gene information theory approach to the reversed engineering of
network inference algorithms on the network level by gene co-expression networks. Bioinformatics 2008;24:2491–7.
ensemble methods. Bioinformatics 2010;26(14):1738–44. 30. Langfelder P, Horvath S. WGCNA: an R package for
16. Lopes FM, Martins DC, Cesar RM. Comparative study weighted correlation network analysis. BMC Bioinformatics
of GRNS inference methods based on feature selection 2008;9:559.
by mutual information. In: IEEE International Workshop on 31. Butte AJ, Kohane IS. Mutual information relevance net-
Genomic Signal Processing and Statistics. Guadalajara, JA, works: functional genomic clustering using pairwise entropy
Mexico, 2009. measurements. Pac Symp Biocomput 2000;1:418–29.
17. Haynes B, Brent M. Benchmarking regulatory network 32. Patrick M, Daniel M, Sushmita R, et al. Information-
reconstruction with GRENDEL. Bioinformatics 2009;25(6): theoretic inference of gene networks using backward
801–7. elimination. In: The 2010 International Conference on
18. Werhli A, Grzegorczyk M, Husmeier D. Comparative Bioinformatics and Computational Biology 2010.

Downloaded from https://academic.oup.com/bib/article/15/2/195/212108 by guest on 05 August 2023


evaluation of reverse engineering gene regulatory net- 33. Yona G, Dirks W, Rahman S, Lin DM. Effective similarity
works with relevance networks, graphical gaussian measures for expression profiles. Bioinformatics 2006;22(13):
models and Bayesian networks. Bioinformatics 2006;22(20): 1616–22.
2523–31.
34. Obayashi T, Kinoshita K. Rank of correlation coefficient as
19. Camacho D, Vera Licona P, Mendes P, Laubenbacher R. a comparable measure for biological significance of gene
Comparison of reverse-engineering methods using an in expression. DNA Res 2009;16:249–60.
silico network. Ann N YAcad Sci 2007;1115:73–89.
35. Joachims T. Making large-scale SVM learning practical. In:
20. Cantone I, Marucci L, Iorio F, et al. A yeast synthetic Schölkopf B, Burges C, Smola A (eds). Advances in Kernel
network for in vivo assessment of reverse-engineering and Methods - Support Vector Learning. Cambridge, MA: MIT
modeling approaches. Cell 2009;137:172–81. Press, 1999;169–84.
21. Bansal M, Belcastro V, Ambesi-Impiombato A, di Bernardo 36. He F, Zeng AP. In search of functional association from
D. How to infer gene networks from expression profiles. time-series microarray data based on the change trend and
Mol Syst Biol 2007;122:78. level of gene expression. BMC Bioinformatics 2006;7:69.
22. Husmeier D. Sensitivity and specificity of inferring genetic 37. Joachims T. Retrospective on transductive inference
regulatory interactions from microarray experiments with for text classification using support vector machines. In:
dynamic Bayesian networks. Bioinformatics 2003;19(17): Proceedings of the International Conference on Machine Learning
2271–82. (ICML). Montreal, Quebec, 2009.
23. Baldi P, Brunak S, Chauvin Y, et al. Assessing the accuracy 38. Krishnan A, Giuliani A, Tomita M. Indeterminacy of
of prediction algorithms for classification: an overview. reverse engineering of gene regulatory networks: the curse
Bioinformatics 2000;16(5):412–24. of gene elasticity. PloS One 2007;2(6):e562.
24. Marbach D, Schaffter T, Mattiussi C, et al. Generating real- 39. He F, Chen H, Probst-Kepper M, etal. PLAU inferred from
istic in silico gene networks for performance assessment a correlation network is critical for suppressor function of
of reverse engineering methods. J Comput Biol 2009;16(2): regulatory T cells. Mol Syst Biol 2012;8:624.
229–39.
40. Qian J, Dolled-Filhart M, Lin J, et al. Beyond synexpression
25. Van den Bulcke T, Van Leemput K, Naudts B, et al. relationships: local clustering of time-shifted and inverted
SynTReN: a generator of synthetic gene expression data gene expression profiles identifies new, biologically relevant
for design and analysis of structure learning algorithms. interactions. J Mol Biol 2001;314(5):1053–66.
BMC Bioinformatics 2006;7:43.
41. Della Gatta G, Palomero T, Perez-Garcia A, et al. Reverse
26. Meyer P, Lafitte F, Bontempi G. minet: A R/Bioconductor engineering of TLX oncogenic transcriptional networks
package for inferring large transcriptional networks using identifies RUNX1 as tumor suppressor in T-ALL.
mutual information. BMC Bioinformatics 2008;9:461. Nat Med 2012;18(3):436–40.
27. Huynh-Thu V, Irrthum A, Wehenkel L, et al. Inferring
regulatory networks from expression data using tree-based
methods. PLoS One 2010;5(9):e12776.

You might also like