Download as pdf or txt
Download as pdf or txt
You are on page 1of 116

Tutorial: Introduction to the Statistical Inference

of Regulatory Networks
Frank Emmert-Streib
Computational Biology and Machine Learning Lab
Center for Cancer Research and Cell Biology
Queens University Belfast, UK
16th September 2012, Belfast
Frank Emmert-Streib Inference of Regulatory Networks 1
Level of the tutorial
general introduction to the problem
emphasize the principle ideas rather than technical details
show how to use the presented methods practically, using R
assumptions:
basic statistical and biological knowledge
no experience in network inference
Frank Emmert-Streib Inference of Regulatory Networks 2
Availability of the Tutorial
Web: www.bio-complexity.com
Email: v@bio-complexity.com
Frank Emmert-Streib Inference of Regulatory Networks 3
References
A list of references cited throughout this tutorial is provided at the
end.
Frank Emmert-Streib Inference of Regulatory Networks 4
Overall outline of the tutorial
1
Motivation (yesterday: Introduction to quantitative and network
biology)
2
Network Inference Methods
3
Evaluation measures
4
Data
biological data (DNA microarray, RNA-seq)
simulated data
5
Practical examples - using R (statistical programming language)
6
Applications
comparison of different inference methods
B-cell lymphoma
7
Summary
Frank Emmert-Streib Inference of Regulatory Networks 5
Network Inference Methods
Part I
Network Inference Methods
Frank Emmert-Streib Inference of Regulatory Networks 6
Network Inference Methods
Overview of part III
general overview of network inference methods [14]
(from gene expression data - concentration of mRNAs)
distinction between association and causal networks
discussion of the principle working mechanism of some network
inference methods
Frank Emmert-Streib Inference of Regulatory Networks 7
Network Inference Methods
Network Inference Methods
Co-expression networks
Low-order partial correlation
GGM (graphical Gaussian model)
RN (relevance networks)
ARACNE
CLR (context likelihood of relatedness)
C3NET (conservative causal core)
MRNET (maximum relevance, minimum redundance)
correlation-based mutual information-based
Frank Emmert-Streib Inference of Regulatory Networks 8
Network Inference Methods
Network Inference Methods
Co-expression networks
Low-order partial correlation
GGM (graphical Gaussian model)
RN (relevance networks)
ARACNE
CLR (context likelihood of relatedness)
C3NET (conservative causal core)
MRNET (maximum relevance, minimum redundance)
correlation-based mutual information-based
causal networks association networks
Frank Emmert-Streib Inference of Regulatory Networks 9
Network Inference Methods
Dening property
Causal (gene) network:
an edge in the network corresponds to a biochemical interaction
that can be validated experimentally
Examples for causal interactions:
protein-protein interaction
protein-DNA binding abstract notation
Frank Emmert-Streib Inference of Regulatory Networks 10
Network Inference Methods
Dening property
Causal (gene) network:
an edge in the network corresponds to a biochemical interaction
that can be validated experimentally
Example for an association:
abstract notation
Frank Emmert-Streib Inference of Regulatory Networks 11
Network Inference Methods
Dening property
Causal (gene) network:
an edge in the network corresponds to a biochemical interaction
that can be validated experimentally
Problems:
conducting a wet-lab experiments for two genes that do not
directly interact with each other requires guesswork
example: differentially expression of genes
Frank Emmert-Streib Inference of Regulatory Networks 12
Network Inference Methods
Potential applications of the inferred networks:
study individual networks
comparison of gene networks identify (causal) interaction
changes (between conditions; disease stages or grades etc)
inferred network
true network
evaluation
do the networks look similar
do the networks have the same biological function
structural comparison
Frank Emmert-Streib Inference of Regulatory Networks 13
Network Inference Methods
Inference methods
(principle ideas)
Frank Emmert-Streib Inference of Regulatory Networks 14
Network Inference Methods
Co-expression networks - association networks
The correlation coefcient
ij
(between gene i and j ) determines the
connection between gene i and j :
Adjacency matrix,
A
ij
= A
ji
=

0 if
ij
<
0
hypothesis test
1 otherwise.
This gives an undirected, unweighted network.
Frank Emmert-Streib Inference of Regulatory Networks 15
Network Inference Methods
Co-expression networks - association networks
The correlation
ij
(between gene i and j ) is ltered according to:
The sigmoid function,
A
ij
=
1
1 + exp((
ij

0
)
,
or the power adjacency function,
A
ij
= |
ij
|

.
Both types of functions lead to undirected but weighted networks.
It has been suggested to choose the above parameters in a way that
the resulting network has approximately a scale-free degree
distribution [34].
Frank Emmert-Streib Inference of Regulatory Networks 16
Network Inference Methods
Low-order partial correlation
Starting from a fully connected adjacency matrix A [11].
Zero-order:
If
ij
= (X
i
, X
j
) = 0 (statistical test) = the connection between
gene X
i
and X
j
is deleted, A
ij
= A
ji
= 0.
First-order:

ij |k
=
X
i
,X
j
|X
k
, are calculated for all triplets of genes with
ij
= 0.
If there is at least one gene X
k
for which
ij |k
= 0 holds
(hypothesis test) = the connection between gene X
i
and X
j
is
deleted, A
ij
= A
ji
= 0.
n-th-order:
analogously
The resulting network from this procedure is frequently called an
undirected dependency graph (UDG) [28].
Frank Emmert-Streib Inference of Regulatory Networks 17
Network Inference Methods
Assumption: best-model conguration
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
A
-1
=
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
A
0
=
initialization correlation of zero-order
fully connected
estimated network (A0) true network
1 2
3
4
5
6 7
8
1 2
3
4
5
6 7
8
Frank Emmert-Streib Inference of Regulatory Networks 18
Network Inference Methods
Assumption: best-model conguration
1 1 1 1 1 1 0 0
1 1 0 0 1 1 0 1
1 0 1 0 0 0 0 0
1 0 0 1 0 0 0 0
1 1 0 0 1 1 0 0
1 1 0 0 1 1 1 0
0 0 0 0 0 1 1 0
0 1 0 0 0 0 0 1
correlation of zero-order
A
1
=
correlation of frst-order
estimated network (A1) true network
1 2
3
4
5
6 7
8
1 2
3
4
5
6 7
8
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
A
0
=
(3,4|k)
Frank Emmert-Streib Inference of Regulatory Networks 19
Network Inference Methods
The partial correlation of rst order:

x
i
x
j
|x
k
=

x
i
x
j

x
i
x
k

x
j
x
k

1
2
x
i
x
k

1
2
x
j
x
k
Frank Emmert-Streib Inference of Regulatory Networks 20
Network Inference Methods
GGM (graphical Gaussian model)
A GGM is given by a specic structure of the inverse of the covariance
matrix, =
1
(precision or concentration matrix). Network inference
methods based on GGM utilize the relation,

ij |V\{ij }
=

ij

ii

jj
,
connecting the partial correlation of full-order with the elements of ,

ij
.
Start with an unconnected adjacency matrix A.
If
ij |V\{ij }
= 0 (hypothesis test) we include an edge, A
ij
= A
ji
= 1,
otherwise there is no edge between i and j .
Frank Emmert-Streib Inference of Regulatory Networks 21
Network Inference Methods
RN (relevance networks) - association networks
The principle idea of RN [6] is to compute all mutual information (MI)
values for all pairs of genes and declare mutual information values as
relevant if their corresponding value is larger than a given threshold I
0
.
Start with a fully connected adjacency matrix A.
If I
ij
= 0 (hypothesis test) =the connection between gene X
i
and X
j
is deleted, A
ij
= A
ji
= 0.
Frank Emmert-Streib Inference of Regulatory Networks 22
Network Inference Methods
Mutual information:
I(X, Y) =

xX

yY
p(x, y) log
p(x, y)
p(x)p(y)
[0, )
captures linear and nonlinear relations between random variables
mutual information needs to be estimated:
Inuence of statistical estimators of mutual information and data
heterogeneity on the inference of gene regulatory networks
de Matos Simoes R & Emmert-Streib F, PLoS One. 2011
6(12):e29279.
Frank Emmert-Streib Inference of Regulatory Networks 23
Network Inference Methods
ARACNE (Algorithm for the Reconstruction of
Accurate Cellular Networks)
ARACNE [5, 23]
Start with a fully connected adjacency matrix A:
If I
ij
= 0 (hypothesis test) =the connection between gene X
i
and X
j
is deleted, A
ij
= A
ji
= 0.
DPI (data processing inequality): testing all gene-triplets (three
genes with signicant MI values) such that, for each triplet (ijk),
the edge corresponding to the lowest mutual information value
I
1
= I
i

j
, with (i

) = argmin{I
ij
, I
jk
, I
ik
}, is eliminated from the
adjacency matrix, if it is lower than the second lowest MI value I
2
multiplied by a factor,
A
i

j
= A
j

i
=

0 I
i

j
I
2
(1 ) I
2
unchanged otherwise.
Frank Emmert-Streib Inference of Regulatory Networks 24
Network Inference Methods
I
1
I
2
I
3
DPI
DPI with epsilon
delte edge with certain probability
(controlled by epsilon)
assume: I
1
<I
2
<I
3
Frank Emmert-Streib Inference of Regulatory Networks 25
Network Inference Methods
C3NET (Conservative Causal Core)
C3NET consists of two (+1) main steps [1].
1
estimate mutual information values among all genes
2
select for each gene just one edge with maximum mutual
information value
3
elimination of nonsignicant edges (testing for MI=0)
The rst step is similar to RN [6], ARACNE [23] or CLR [15] essential
for eliminating nonsignicant links, according to a chosen signicance
level , between gene pairs. In the second step, the most signicant
link for each gene is selected. This link corresponds also to the highest
MI value among the neighbor edges for each gene. This implies that
the highest possible number of edges that can be inferred by C3NET is
equal to the number of genes under consideration.
Frank Emmert-Streib Inference of Regulatory Networks 26
Network Inference Methods
1
2
3
4
5
6
7
8
1
3
2
4
7
8
5
6
max MI gene gene
8 genes 8 edges conservative!
Principle working mechanism of C3NET
Frank Emmert-Streib Inference of Regulatory Networks 27
Network Inference Methods
principle of BC3NET:
bagging (= bootstrap aggregation - Leo Breiman 1996) C3NET
Bayesian version of C3NET with noninformative priors
Bagging statistical network inference from large-scale gene expression data de Matos Simoes R & Emmert-Streib F,
PLoS One. 2012;7(3):e33624.
Frank Emmert-Streib Inference of Regulatory Networks 28
Network Inference Methods
Software:
R package: c3net
Implementation of C3NET
Available from the CRAN repository.
Inferring the conservative causal core of gene regulatory networks, G. Altay & F. Emmert-Streib, BMC Systems Biology
4:132 (2010).
Structural Inuence of gene networks on their inference: Analysis of C3NET, G. Altay & F. Emmert-Streib, Biol Direct.
2011 Jun 22;6:31.
Software:
R package: bc3net
Implementation of BC3NET
Available from the CRAN repository.
Bagging statistical network inference from large-scale gene expression data de Matos Simoes R & Emmert-Streib F,
PLoS One. 2012;7(3):e33624.
Frank Emmert-Streib Inference of Regulatory Networks 29
Network Inference Methods
MRNET (maximum relevance, minimum redundance)
MRNET [25] is an iterative algorithm that identies potential interaction
partners of a target gene Y that maximize a scoring function.
X
s
j
= argmax
X
j
V\S

s
j

(1)
s
j
= I(X
j
; Y)
1
|S|

X
k
S
I(X
j
; X
k
). (2)
When a gene, X
j
, is found with a score that maximizes Eqn. 1 and s
j
is
above a threshold, s
0
, then this gene is added to the set S.
Basic idea: maximal relevance (rst term in Eqn. 2) for Y, but
minimum redundancy (second term in Eqn. 2) with respect to the
already found interaction partners in the set S. Starting with a fully
connected network MRNET reduces successively edges between Y
and V\S that have not been found by the algorithm.
Frank Emmert-Streib Inference of Regulatory Networks 30
Network Inference Methods
Y
3
2
4
X
j
6
s
j
= I(X
j
;Y) (1\S)
X in S
I(X
j
;X
k
)
k
principle of MRNET
Frank Emmert-Streib Inference of Regulatory Networks 31
Network Inference Methods
Software:
R package: minet
Implementation of MRNET, ARACNE, CLR
Available from Bioconductor.
P. Meyer, F. Latte and G. Bontempi, minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using
Mutual Information, BMC Bioinformatics 9:461 (2008).
Frank Emmert-Streib Inference of Regulatory Networks 32
Network Inference Methods
How about Bayesian Networks?
A J Hartemink, Reverse engineering gene regulatory networks, Nature Biotechnology 23, 554 - 555 (2005).
Frank Emmert-Streib Inference of Regulatory Networks 33
Network Inference Methods
How about Bayesian Networks?
A J Hartemink, Reverse engineering gene regulatory networks, Nature Biotechnology 23, 554 - 555 (2005).
Very high computational complexity (NP-complete) [9].
Frank Emmert-Streib Inference of Regulatory Networks 34
Network Inference Methods
What are Bayesian Network again?
J. Pearl, Probabilistic reasoning in intelligent systems, 1988.
Frank Emmert-Streib Inference of Regulatory Networks 35
Network Inference Methods
Graphical models:
Graphical models is a term that refers to the separation of a joint
probability distribution (over p variables) into conditional
probabilities.
The resulting conditional probability distributions are connected to
a network representation.
Network inference methods are related to graphical models.
Frank Emmert-Streib Inference of Regulatory Networks 36
Network Inference Methods
name/publication key method software
c
o
r
r
e
l
a
t
i
o
n
coexpression [7, 34] correlation WGCNA
Asymmetric-N [8] correlation, ranking no
SEM [33] + , genetic algorithm no
ParCorA [11] partial correlation rst order ParCorA
GGM [27] full order partial correlation GeneNet
BN [18] partial correlation n-th order yes
1
i
n
f
o
r
m
a
t
i
o
n
RN [6] mutual information MINET
ARACNE [23] mutual information, DPI MINET
CLR [15] mutual information, background MINET
C3NET [1, 3] maximum mutual information c3net
SA-CLR [31] mutual information, synery no
MRNET [25] mutual informations MINET
MI3 [31] three-way mutual information MI3
CMI [29] conditional mutual information no
MI-CMI [21] MI, conditional mutual information no
[35] MI, conditional mutual information no
BC3NET [12] bootstrap aggregation of C3NET bc3net
1
Available from the authors.
Frank Emmert-Streib Inference of Regulatory Networks 37
Network Inference Methods
Multiple hypothesis testing
There are two types of methods providing either,
p-value for each edge
score for each edge
If p-value, apply multiple testing correction.
Problem:
high correlation be conservative (Bonferroni correction)
Frank Emmert-Streib Inference of Regulatory Networks 38
Network Inference Methods
Example: Correlations within biological pathways
Tripathi S, Emmert-Streib F (2012) Assessment Method for a Power Analysis to Identify Differentially Expressed
Pathways. PLoS ONE 7(5): e37510
Frank Emmert-Streib Inference of Regulatory Networks 39
General statistical measures
Ontology-based measures
Network-based measures
Part II
Evaluation measures
Frank Emmert-Streib Inference of Regulatory Networks 40
General statistical measures
Ontology-based measures
Network-based measures
Overview of part V
difculty of assessing the performance of an inference method
different types of error measures:
general statistical measures
oncology-based measures
network-based measures
Frank Emmert-Streib Inference of Regulatory Networks 41
General statistical measures
Ontology-based measures
Network-based measures
biological data
inferred network
true network
evaluation
cell type
Frank Emmert-Streib Inference of Regulatory Networks 42
General statistical measures
Ontology-based measures
Network-based measures
biological data
inferred network true network
evaluation
cell type
literature & databases
reference network
Frank Emmert-Streib Inference of Regulatory Networks 43
General statistical measures
Ontology-based measures
Network-based measures
simulated data
inferred network
true network
evaluation
simulted cell type
Frank Emmert-Streib Inference of Regulatory Networks 44
General statistical measures
Ontology-based measures
Network-based measures
Different evaluation measures:
General statistical measures
Ontology-based measures
Network-based measures
Frank Emmert-Streib Inference of Regulatory Networks 45
General statistical measures
Ontology-based measures
Network-based measures
General statistical measures
basic entities: TP, FP, TN, FN
sensitivity =
TP
TP+FN
specicity =
TN
TN+FP
complementary sensitivity = 1 sensitivity =
FP
TN+FP
precision = P =
TP
TP+FP
recall = R = sensitivity
accuracy =
TP+TN
TP+TN+FP+FN
Obtained by comparison of a inferred network with the true network
underlying the data.
Frank Emmert-Streib Inference of Regulatory Networks 46
General statistical measures
Ontology-based measures
Network-based measures
The rst measure is the area under the curve for the receiver operator
characteristics (AUC-ROC) [16]. The ROC curve represents the
sensitivity as function of the complementary specicity obtained by
using various threshold values , instead of one specic, the
algorithm depends on. This leads to a -dependence of all quantities
listed above and, hence, allows to obtain a functional behavior among
these measures. The second measure is the area under the
precision-recall curve (AUC-PR), obtained similarly as AUC-ROC, and
the third is the F-score,
F = 2
PR
P + R
.
F assumes values in [0, 1].
F = 0 =very bad (worst)
F = 1 =very good (best)
Frank Emmert-Streib Inference of Regulatory Networks 47
General statistical measures
Ontology-based measures
Network-based measures
Ontology-based measures
Another strategy to assess the performance of an inference method is
based on the biological relevance of the inferred network. In general, it
is assumed that genes in a gene expression network are preferentially
linked to genes involved in similar biological processes [32]. There are
several publicly available resources of biological knowledge which can
be used to test whether this holds true, for example, the Gene
Ontology database (GO) [4] or KEGG [20].
There are also several currated organism-specic knowledge
databases, such as RegulonDB (E.coli) [17] and MIPS (yeast) [24].
Functional congruence of clusters of co-expressed genes is a popular
validation measure for clustering algorithms [10] and in principle can
be used for the validation of inferred networks.
Frank Emmert-Streib Inference of Regulatory Networks 48
General statistical measures
Ontology-based measures
Network-based measures
Network-based measures
Problems with general statistical measures:
global assessment of the whole network
Are interactions of TFs better to infer than others?
Is a three-gene forward motif more difcult to infer?
local subnetwork
F-score is a random variable
mean value of F-score
variance of F-score
distribution of F-score
ensemble (Bootstrap ensemble)
Frank Emmert-Streib Inference of Regulatory Networks 49
General statistical measures
Ontology-based measures
Network-based measures
Network-based measures
D
1
(G)
D
2
(G)
D
b
(G)
ensemble data
G
1
G
2
G
b
G
estimated networks
-experiments
-simulation
G
P
probabilistic network
-bootstrap
Frank Emmert-Streib Inference of Regulatory Networks 50
General statistical measures
Ontology-based measures
Network-based measures
probabilistic network
G
1
G
2
G
b
G
P
weight i-j =(number of times edge i-j is present in {G
i
})/b
i
j
Frank Emmert-Streib Inference of Regulatory Networks 51
General statistical measures
Ontology-based measures
Network-based measures
probabilistic network
G
1
G
2
G
b
G
P
TPR of i-j =(number of times edge i-j is present in {G
i
})/b
i
j
true network
(reference network)
G
k
TNR of i-k =(number of times no edge i-k is present in {G
i
})/b
knowledge about TP and TN edges
Frank Emmert-Streib Inference of Regulatory Networks 52
Biological data
Simulated data
Part III
Data
Frank Emmert-Streib Inference of Regulatory Networks 53
Biological data
Simulated data
Overview of part VI
inferring regulatory networks from
biological data (DNA microarray, RNA-seq)
discuss different types of biological data
simulated data
discuss simulation principles
Frank Emmert-Streib Inference of Regulatory Networks 54
Biological data
Simulated data
Biological data
Frank Emmert-Streib Inference of Regulatory Networks 55
Biological data
Simulated data
biological cells
DNA microarrays
sample
RNA-seq
measuring abundance of mRNAs
Frank Emmert-Streib Inference of Regulatory Networks 56
Biological data
Simulated data
Different types of data:
observational data
experimental data (intervention, perturbation)
Frequently, only observational data are available because
experimental data cannot be collected due to ethic reasons (especially
involving humans).
Frank Emmert-Streib Inference of Regulatory Networks 57
Biological data
Simulated data
Different types of data:
steady-state data
time series (longitudinal) data
Frequently, only steady-state data are available.
Frank Emmert-Streib Inference of Regulatory Networks 58
Biological data
Simulated data
Simulated data
Frank Emmert-Streib Inference of Regulatory Networks 59
Biological data
Simulated data
Simplied kinetic equations (ordinary differential equations (ODE)):
d mRNA
i
dt
= S
i
(R) D
mRNA
i
mRNA
i
d prot
i
dt
= T
P
i
mRNA
i
D
P
i
prot
i
5
5 3
3
DNA
mRNA
transcription
translation
protein
5 3
Frank Emmert-Streib Inference of Regulatory Networks 60
Biological data
Simulated data
Simplied kinetic equations (ordinary differential equations (ODE)):
d mRNA
i
dt
= S
i
(R) D
mRNA
i
mRNA
i
d prot
i
dt
= T
P
i
mRNA
i
D
P
i
prot
i
Signaling function:
S
i
(R)
K
n
R
n
i
+K
n
and
R
n
i
R
n
i
+K
n
(built of Hill functions)
Binding afnity: K
Hill coefcient: n
Frank Emmert-Streib Inference of Regulatory Networks 61
Biological data
Simulated data
Hill functions:
I(R) =
K
n
R
n
+K
n
(inhibitor) A(R) =
R
n
R
n
+K
n
(activator)
0.0 0.2 0.4 0.6 0.8 1.0
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
R
H
i
l
l

f
u
n
c
t
i
o
n
0.0 0.2 0.4 0.6 0.8 1.0
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
R
H
i
l
l

f
u
n
c
t
i
o
n
Example: K = 0.5 (xed), n = 1 (blue), n = 2 (red), n = 10 (green)
Frank Emmert-Streib Inference of Regulatory Networks 62
Biological data
Simulated data
Hill functions:
I(R) =
K
n
R
n
+K
n
(inhibitor) A(R) =
R
n
R
n
+K
n
(activator)
0.0 0.2 0.4 0.6 0.8 1.0
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
R
H
i
l
l

f
u
n
c
t
i
o
n
0.0 0.2 0.4 0.6 0.8 1.0
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
R
H
i
l
l

f
u
n
c
t
i
o
n
Example: n = 10 (xed), K = 0.25 (blue), K = 0.5 (red), K = 0.9
(green)
Frank Emmert-Streib Inference of Regulatory Networks 63
Biological data
Simulated data
Simplied kinetic equations (ODE):
d mRNA
i
dt
= S
i
(R) D
mRNA
i
mRNA
i
d prot
i
dt
= T
P
i
mRNA
i
D
P
i
prot
i
Signaling function:
S
i
(R)
K
n
R
n
i
+K
n
and
R
n
i
R
n
i
+K
n
R is a vector of TFs that control the transcription of gene i (plus
external signals)
R
i
=

j
w
ij
R
j
Frank Emmert-Streib Inference of Regulatory Networks 64
Biological data
Simulated data
gene i
regulatory network (part)
external signal
background network
Each gene i has its own network dependent vector w
i ,[]
controlling its
transcription.
Frank Emmert-Streib Inference of Regulatory Networks 65
Biological data
Simulated data
There are many other ways to simulate expression data, based on
different principles:
with or without protein level
with or without time delayed
deterministic or stochastic
with or without external signals
Frank Emmert-Streib Inference of Regulatory Networks 66
Biological data
Simulated data
Software packages to simulate expression data:
name/publication special features
netsim [13] R package
GNW [22] various dynamic models
GRENDL [19] use kinetic parameters from yeast
SGNSim [26] stochastic simulation algorithm
SynTRen [30] steady-state
Frank Emmert-Streib Inference of Regulatory Networks 67
Getting started
Part IV
Practical examples
Frank Emmert-Streib Inference of Regulatory Networks 68
Getting started
Overview of part VI
discussing several practical examples, possible problems and
difculties
Frank Emmert-Streib Inference of Regulatory Networks 69
Getting started
Getting started
Frank Emmert-Streib Inference of Regulatory Networks 70
Getting started
Suppose: You have 300 arrays of expression data with > 10.000
genes.
How to check if the inferred network for a large-scale expression
data set makes sense?
Not a good start!
Better: Start with simulations to assess the implemented
algorithm.
Frank Emmert-Streib Inference of Regulatory Networks 71
Getting started
Suppose: You have 300 arrays of expression data with > 10.000
genes.
How to check if the inferred network for a large-scale expression
data set makes sense?
Not a good start!
Better: Start with simulations to assess the implemented
algorithm.
Frank Emmert-Streib Inference of Regulatory Networks 72
Getting started
Simple example with C3NET using R: 50 genes
library("igraph"); library("netsim"); library("minet");library("c3net")
N <- 50; prob <- 0.035
g <- erdos.renyi.game(n=N, p=prob, directed = TRUE, loops = TRUE)
W <- get.adjacency(g, type="both")
ind <- which(W != 0, arr.ind = TRUE)
for(i in 1:dim(ind)[1]){
if(runif(1) < 0.5) W[ind[i,1], ind[i,2]] <- -1
}
T <- 100; E <- 300
dnew <- matrix(0, nrow = N, ncol = E)
for(i in 1:E){
aux <- simulateprofiles(weights=W, act.fun="sigmoidal", method="lsoda", times=c(0,1,T))
dnew[,i] <- aux[,3]
}
net <- c3(build.mim(discretize(t(dnew),disc="equalwidth" ), estimator="mi.mm" ))
res <- validate(net, abs(W))
P <- res[,2]/(res[,2] + res[,3])
R <- res[,2]/(res[,2] + res[,4])
F <- 2
*
P
*
R/(P+R)
mF <- max(F, na.rm = TRUE)
Frank Emmert-Streib Inference of Regulatory Networks 73
Getting started
1
2
3
library("igraph"); library("netsim"); library("minet");library("c3net");
N <- 50; prob <- 0.035
g <- erdos.renyi.game(n=N, p=prob, directed = TRUE, loops = TRUE)
W <- get.adjacency(g, type="both")
ind <- which(W != 0, arr.ind = TRUE)
for(i in 1:dim(ind)[1]){
if(runif(1) < 0.5) W[ind[i,1], ind[i,2]] <- -1
}
T <- 100; E <- 300
dnew <- matrix(0, nrow = N, ncol = E)
for(i in 1:E){
aux <- simulateprofles(weights=W, act.fun="sigmoidal", method="lsoda", times =c(0,1,T))
dnew[,i] <- aux[,3]
}
net <- c3(build.mim(discretize(t(dnew),disc="equalwidth" ), estimator="mi.mm" ))
res <- validate( net, abs(W))
P <- res[,2]/(res[,2] + res[,3])
R <- res[,2]/(res[,2] + res[,4])
F <- 2*P*R/(P+R)
mF <- max(F, na.rm = TRUE)
generate network
simulate data
infer network
Frank Emmert-Streib Inference of Regulatory Networks 74
Getting started
simulated data
inferred network
true network
evaluation
simulted cell type
1. Algorithm
2. Algorithm
3. Algorithm
Frank Emmert-Streib Inference of Regulatory Networks 75
Getting started
Visualization of the network with igraph:
tkplot(g, vertex.size=3.0, vertex.color = "blue")
Frank Emmert-Streib Inference of Regulatory Networks 76
Getting started
1
2
3
library("igraph"); library("netsim"); library("minet");library("c3net");
N <- 50; prob <- 0.035
g <- erdos.renyi.game(n=N, p=prob, directed = TRUE, loops = TRUE)
W <- get.adjacency(g, type="both")
ind <- which(W != 0, arr.ind = TRUE)
for(i in 1:dim(ind)[1]){
if(runif(1) < 0.5) W[ind[i,1], ind[i,2]] <- -1
}
T <- 100; E <- 300
dnew <- matrix(0, nrow = N, ncol = E)
for(i in 1:E){
aux <- simulateprofles(weights=W, act.fun="sigmoidal", method="lsoda", times =c(0,1,T))
dnew[,i] <- aux[,3]
}
net <- c3(build.mim(discretize(t(dnew),disc="equalwidth" ), estimator="mi.mm" ))
res <- validate( net, abs(W))
P <- res[,2]/(res[,2] + res[,3])
R <- res[,2]/(res[,2] + res[,4])
F <- 2*P*R/(P+R)
mF <- max(F, na.rm = TRUE)
generate network
simulate data
infer network
Result: maximal F-score: 0.43
Frank Emmert-Streib Inference of Regulatory Networks 77
Getting started
Question: Is a F-score of 0.43 a high or low value?
Frank Emmert-Streib Inference of Regulatory Networks 78
Getting started
Fscore
F
r
e
q
u
e
n
c
y
0.05 0.10 0.15 0.20
0
5
1
0
1
5
2
0
Histogram of 100 F-scores obtained from randomized data.
Frank Emmert-Streib Inference of Regulatory Networks 79
Getting started
Question: Is a F-score of 0.43 a high or low value?
Using randomized data as input for the inference algorithm gives in
average F
rand
0.12
Frank Emmert-Streib Inference of Regulatory Networks 80
Getting started
Question: Is the result for F
rand
dependent or independent of the
underlying (true) network G?
Frank Emmert-Streib Inference of Regulatory Networks 81
Getting started
Question: Is the result for F
rand
dependent or independent of the
underlying (true) network G?
Example:
1
network G with 50 genes constructed with p = 0.015 (p = 0.035)
2
generate randomized data
3
C3NET
F
rand
0.07
= the randomized F-score is a function of the network, F
rand
(G)
Frank Emmert-Streib Inference of Regulatory Networks 82
Getting started
Lets take a closer look:
> net
V1 V2 V3 V4 V5 V6
V1 0.000000000 0.0244688433 0.027914887 0.013969485 0.000000000 0.000000000
V2 0.024468843 0.0000000000 0.000000000 0.048966411 0.000000000 0.023253497
V3 0.027914887 0.0000000000 0.000000000 0.012782872 0.000000000 0.000000000
V4 0.013969485 0.0489664111 0.012782872 0.000000000 0.000000000 0.013892107
V5 0.000000000 0.0000000000 0.000000000 0.000000000 0.000000000 0.002506736
V6 0.000000000 0.0232534968 0.000000000 0.013892107 0.002506736 0.000000000
V7 0.000000000 0.0000000000 0.089198730 0.021163258 0.000000000 0.012747563
V8 0.015237811 0.0175343827 0.000000000 0.000000000 0.125830242 0.000000000
V9 0.000000000 0.0000000000 0.000000000 0.007453967 0.000000000 0.000000000
V10 0.044672679 0.9857632600 0.984655214 0.040205932 0.000000000 0.000000000
V11 0.057223037 0.0105832890 0.002417762 0.000000000 0.969407882 0.000000000
V12 0.034959606 0.0636157198 0.000000000 0.000000000 0.089818791 0.020046922
V13 0.000000000 0.0287671936 0.000000000 0.009385961 0.061685278 0.000000000
V14 0.000000000 0.0000000000 0.000000000 0.000000000 0.024414013 0.000000000
V15 0.000000000 0.0003002626 0.011006357 0.000000000 0.000000000 0.014411812
V16 0.038310232 0.0089231394 0.029520004 0.015764865 0.000000000 0.000000000
V17 0.063909136 0.0000000000 0.000000000 0.003270828 0.022446779 0.040262031
V18 0.013114730 0.0303191134 0.019913482 0.010910713 0.000000000 0.000000000
V19 0.000000000 0.0000000000 0.000000000 0.013000233 0.012815517 0.000000000
inferred network (part)
Frank Emmert-Streib Inference of Regulatory Networks 83
Getting started
> net
V1 V2 V3 V4 V5 V6
V1 0.000000000 0.0244688433 0.027914887 0.013969485 0.000000000 0.000000000
V2 0.024468843 0.0000000000 0.000000000 0.048966411 0.000000000 0.023253498
V3 0.027914887 0.0000000000 0.000000000 0.012782872 0.000000000 0.000000000
V4 0.013969485 0.0489664111 0.012782872 0.000000000 0.000000000 0.013892107
V5 0.000000000 0.0000000000 0.000000000 0.000000000 0.000000000 0.002506736
V6 0.000000000 0.0232534968 0.000000000 0.013892107 0.002506736 0.000000000
V7 0.000000000 0.0000000000 0.089198730 0.021163258 0.000000000 0.012747563
V8 0.015237811 0.0175343827 0.000000000 0.000000000 0.125830242 0.000000000
V9 0.000000000 0.0000000000 0.000000000 0.007453967 0.000000000 0.000000000
V10 0.044672679 0.9857632600 0.984655214 0.040205932 0.000000000 0.000000000
V11 0.057223037 0.0105832890 0.002417762 0.000000000 0.969407882 0.000000000
V12 0.034959606 0.0636157198 0.000000000 0.000000000 0.089818791 0.020046922
V13 0.000000000 0.0287671936 0.000000000 0.009385961 0.061685278 0.000000000
V14 0.000000000 0.0000000000 0.000000000 0.000000000 0.024414013 0.000000000
V15 0.000000000 0.0003002626 0.011006357 0.000000000 0.000000000 0.014411812
V16 0.038310232 0.0089231394 0.029520004 0.015764865 0.000000000 0.000000000
V17 0.063909136 0.0000000000 0.000000000 0.003270828 0.022446779 0.040262031
V18 0.013114730 0.0303191134 0.019913482 0.010910713 0.000000000 0.000000000
V19 0.000000000 0.0000000000 0.000000000 0.013000233 0.012815517 0.000000000
inferred network (part)
apply threshold
network
(unweighted,
undirected)
Frank Emmert-Streib Inference of Regulatory Networks 84
Getting started
> res <- validate( net, W)
thrsh TP FP TN FN
1 0.00 67 2433 0 0
2 0.02 26 742 1691 41
3 0.04 14 394 2039 53
4 0.06 8 216 2217 59
5 0.08 4 118 2315 63
apply various thresholds
error measures, e.g., F-score
various networks
(unweighted, undirected)
F=2 PR/(P+R)
P= TP/(TP+FP)
R = TP/(TP+FN)
Frank Emmert-Streib Inference of Regulatory Networks 85
Getting started
> res <- validate( net, W)
thrsh TP FP TN FN
1 0.00 67 2433 0 0
2 0.02 26 742 1691 41
3 0.04 14 394 2039 53
4 0.06 8 216 2217 59
5 0.08 4 118 2315 63
apply various thresholds
error measures, e.g., F-score
various networks
(unweighted, undirected)
F=2 PR/(P+R)
P= TP/(TP+FP)
R = TP/(TP+FN)
F-score
0.0522
0.0622
0.0589
0.0549
0.0423
Frank Emmert-Streib Inference of Regulatory Networks 86
Getting started
> res <- validate( net, W)
thrsh TP FP TN FN
1 0.00 67 2433 0 0
2 0.02 26 742 1691 41
3 0.04 14 394 2039 53
4 0.06 8 216 2217 59
5 0.08 4 118 2315 63
apply various thresholds
error measures, e.g., F-score
various networks
(unweighted, undirected)
F=2 PR/(P+R)
P= TP/(TP+FP)
R = TP/(TP+FN)
F-score
0.0522
0.0622
0.0589
0.0549
0.0423
maximum F-score
Frank Emmert-Streib Inference of Regulatory Networks 87
Getting started
> res <- validate( net, W)
thrsh TP FP TN FN
1 0.00 67 2433 0 0
2 0.02 26 742 1691 41
3 0.04 14 394 2039 53
4 0.06 8 216 2217 59
5 0.08 4 118 2315 63
apply various thresholds
error measures, e.g., F-score
various networks
(unweighted, undirected)
F=2 PR/(P+R)
P= TP/(TP+FP)
R = TP/(TP+FN)
F-score
0.0522
0.0622
0.0589
0.0549
0.0423
maximum F-score
optimal threshold
Frank Emmert-Streib Inference of Regulatory Networks 88
Getting started
> res <- validate( net, W)
thrsh TP FP TN FN
1 0.00 67 2433 0 0
2 0.02 26 742 1691 41
3 0.04 14 394 2039 53
4 0.06 8 216 2217 59
5 0.08 4 118 2315 63
apply various thresholds
error measures, e.g., F-score
various networks
(unweighted, undirected)
F=2 PR/(P+R)
P= TP/(TP+FP)
R = TP/(TP+FN)
F-score
0.0522
0.0622
0.0589
0.0549
0.0423
maximum F-score
optimal threshold
assumption: reference network is available
Frank Emmert-Streib Inference of Regulatory Networks 89
Comparison of Methods
B cell lymphoma
Part V
Applications
Frank Emmert-Streib Inference of Regulatory Networks 90
Comparison of Methods
B cell lymphoma
Comparison of Methods
Frank Emmert-Streib Inference of Regulatory Networks 91
Comparison of Methods
B cell lymphoma
C3N50 C3N200 AR50 AR200 MR50 MR200 RN50 RN200 CLR50 CLR200
0
.
2
0
0
.
2
5
0
.
3
0
0
.
3
5
0
.
4
0
0
.
4
5
0
.
5
0
0
.
5
5
F
Figure: Boxplots for F-scores. A subnetwork of the Yeast GRN with 100
genes is used for the simulations. Ensemble size is N = 300 [1].
Frank Emmert-Streib Inference of Regulatory Networks 92
Comparison of Methods
B cell lymphoma
SIN3
SNF2_SWI1
PHO2
ADR1
SWI4_SWI6
TUP1
SPT16
SWI5
PHO4
INO2
REB1
MBP1_SWI6
ACE2
GAL11
INO2_INO4
CAF16
SNF6
SPT6
HDA1_TUP1
SWI4
UME6
SNF2
MIG1
SPT21
CAF4
LEU3
SKN7
SWI3
HAP2_3_4_5
FUS1
PHO5
ADH2
HO
STA1
STE6
STA2
PHO11
PHO84
ALPHA1
FLO10
FLO1
LEU4
TRP4
GUT1
ACS1
CTS1
CLN1
CLN2
STE5
RNR3
GAL10
GAL1
STE4
ACT1
HTA1
HTB1
LEU2
SWI6
CLN3
SIC1
CDC6
EGT2
MET14
PMA1
CDC19
FAS1
TPI1
RAD51
CLB6
TRR1 POL2
RFA1
RAD5
CLB5
CHO1
CDS1
CHO2
PGS1
PSD1
ITR1
HNM1
SUC2
HXT4
CDC10
CDC11
PCL1
POX1
REC102
TOP1
MEK1
ZIP1
CAR1
CAR2 YEL070W
JEN1
PDC1
HAP2_HAP4
GDH1
COX6
CIT1
Figure: Subnetwork of yeast consisting of 100 genes, sample size is 200.
Frank Emmert-Streib Inference of Regulatory Networks 93
Comparison of Methods
B cell lymphoma
Table: Summary of motif statistics for ARACNE, CLR, MRNET and RN [2].
measure/motif type 1 2 3 4
A
R
A
#m 40 171 446 10
p 0.591 0.352 0.530 0.156
( p) 0.15 0.04 0.18 0.12
C
L
R
#m 40 171 446 10
p 0.506 0.378 0.480 0.171
( p) 0.131 0.072 0.137 0.194
M
R
#m 40 171 446 10
p 0.568 0.326 0.582 0.176
( p) 0.1576 0.0083 0.2237 0.1434
R
N
#m 40 171 446 10
p 0.511 0.321 0.515 0.151
( p) 0.121 0.010 0.152 0.112
C
L
R
(
E
C
)
#m 2105 321896 1315 997
p 0.3625 0.3353 0.3355 0.0558
( p) 0.103 0.026 0.031 0.168
A
B
C
A
B
C A
B
C
A
B
C
1
2
3
4
Frank Emmert-Streib Inference of Regulatory Networks 94
Comparison of Methods
B cell lymphoma
B cell lymphoma
Frank Emmert-Streib Inference of Regulatory Networks 95
Comparison of Methods
B cell lymphoma
Case study: B-cell lymphoma
[1] Kppers, R.
Mechanisms of B-cell lymphoma pathogenesis.
Nat Rev Cancer 5, 251-62
Frank Emmert-Streib Inference of Regulatory Networks 96
Comparison of Methods
B cell lymphoma
B cell dataset and the analysis of MYC
344 samples (GSE2350 -
Gene Expression Omnibus)
B cell chronic lymphocytic
leukemia (BCLL), diffuse large B
cell lymphomas (DLBLL), Burkitt
lymphoma (BL), MCL etc
ARACNE network 6000 genes
with 129,000 interactions
56 genes directly connected to
MYC (among 5% largest hubs)
ChiP analysis for MYC in Ramos
burkitt lymphoma cell line
Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla-Favera R, Califano A, Reverse engineering of regulatory networks in human
B cells, Nat Genet 2005, 37(4):382-390
Frank Emmert-Streib Inference of Regulatory Networks 97
Comparison of Methods
B cell lymphoma
Data Processing
344 samples Affymetrix hgu95a, hgu95a_v2
RMA and quantile normalization (Irizarry et al. 2003)
median expression for entrez gene identier with multiple
probesets
9,684 genes
C3NET (Altay et al. 2010)
Frank Emmert-Streib Inference of Regulatory Networks 98
Comparison of Methods
B cell lymphoma
C3NET network components inferred from the B-cell
lymphoma dataset
component size
F
r
e
q
u
e
n
c
y
0
2
0
4
0
6
0
8
0
1
0
0
3 7 20 55 148 403 1097
Total 463 components (2 genes), 5% largest >100 genes (25 components)
Frank Emmert-Streib Inference of Regulatory Networks 99
Comparison of Methods
B cell lymphoma
The giant connected component of the B-cell gene
network
NR5A1
SNX29
CALCA
SLC6A7
P2RX6
CLDN9
TGM4
CACNA1F
884 genes with 883 interactions
Frank Emmert-Streib Inference of Regulatory Networks 100
Comparison of Methods
B cell lymphoma
Cellular component GO enrichment analysis of the
giant connected component
GO.ID Term Annotated Signicant Expected p-value
GO:0016021 integral to membrane 2201 282 193.28 2.2e-14
GO:0031224 intrinsic to membrane 2260 285 198.46 1.2e-13
GO:0044459 plasma membrane part 1388 195 121.89 4e-13
GO:0044425 membrane part 2739 327 240.53 1.1e-12
GO:0005886 plasma membrane 2084 264 183.01 1.3e-12
GO:0005887 integral to plasma membrane 900 131 79.03 9e-10
GO:0031226 intrinsic to plasma membrane 915 132 80.35 1.4e-09
GO:0016020 membrane 3462 372 304.02 4.3e-08
GO:0034702 ion channel complex 125 31 10.98 6.7e-08
GO:0005576 extracellular region 1077 136 94.58 3e-06
GO:0034703 cation channel complex 80 21 7.03 3.4e-06
GO:0034704 calcium channel complex 22 10 1.93 6.2e-06
GO:0005903 brush border 31 11 2.72 3.6e-05
GO:0005891 voltage-gated calcium channel com-
plex
17 8 1.49 4e-05
GO:0030054 cell junction 327 48 28.72 0.00024
top 15 Cellular Component terms
Frank Emmert-Streib Inference of Regulatory Networks 101
Comparison of Methods
B cell lymphoma
Core structure of the gene network in B-cell lymphoma
NR5A1
SNX29
CALCA
SLC6A7
P2RX6
CLDN9
TGM4
CACNA1F
B-cell lymphoma C3NET 9, 221 edges (total)
R. de Matos Simoes, S. Tripathi and F. Emmert-Streib,
Organization of the gene network of B-cell lymphoma, BMC
Systems Biology, 6(1):38, 2012.
Basso (2005), ARACNE 129, 000 edges (total)
K. Basso, A. A. Margolin, G. Stolovitzky, U. Klein, R.
Dalla-Favera, and A. Califano. Reverse engineering of
regulatory networks in human B cells. Nature Genetics,
37(4):382-390, 2005.
Frank Emmert-Streib Inference of Regulatory Networks 102
Summary and Literature
Part VI
Summary and Literature
Frank Emmert-Streib Inference of Regulatory Networks 103
Summary and Literature
inferred gene regulatory network biological cells expression data
analysis and interpretation of
the network structure
elucidate phenotypic attributes
gene ontology (GO)
KEGG
This tutorial provided:
- motivation for inferring regulatory networks
- interpretation of gene (regulatory) networks
- overview of network inference methods
- overview of evaluation methods
- discussion of different data types: biological/simulated data
- visualization of gene networks
- some applications to large-scale data sets
inference method
Frank Emmert-Streib Inference of Regulatory Networks 104
Summary and Literature
inferred gene regulatory network biological cells expression data
analysis and interpretation of
the network structure
elucidate phenotypic attributes
gene ontology (GO)
KEGG
This tutorial provided:
- motivation for inferring regulatory networks
- interpretation of gene (regulatory) networks
- overview of network inference methods
- overview of evaluation methods
- discussion of different data types: biological/simulated data
- visualization of gene networks
- some applications to large-scale data sets
inference method
Frank Emmert-Streib Inference of Regulatory Networks 105
Summary and Literature
Review article
Statistical inference and reverse engineering of gene regulatory
networks from observational expression data
Emmert-Streib F, Glazko GV, Altay G, de Matos Simoes R., Front
Genet. 2012;3:8.
Advanced topics
Bagging statistical network inference from large-scale gene
expression data de Matos Simoes R & Emmert-Streib F,
PLoS One. 2012;7(3):e33624.
Inuence of statistical estimators of mutual information and data
heterogeneity on the inference of gene regulatory networks
de Matos Simoes R & Emmert-Streib F, PLoS One. 2011
6(12):e29279.
Organizational structure and the periphery of the gene regulatory
network in B-cell lymphoma de Matos Simoes R, Tripathi S &
Emmert-Streib F, BMC Syst Biol. 2012 6(1):38.
Frank Emmert-Streib Inference of Regulatory Networks 106
Summary and Literature
Availability of the Tutorial
Web: www.bio-complexity.com
Questions: Email - v@bio-complexity.com
Frank Emmert-Streib Inference of Regulatory Networks 107
Summary and Literature
Problem of Induction & Causality
David Hume (1711 - 1776)
Sewall Wright (1889 - 1988)
Karl Popper (1902 - 1994)
Donald Rubin
Judea Pearl
Frank Emmert-Streib Inference of Regulatory Networks 108
Summary and Literature
Acknowledgements:
postdoc
Ricardo de Matos
Simoes
PhD student
Shailesh Tripathi
former postdoc
Gkmen Altay - now
University of Cambridge
Paul Mullan, CCRCB Matthias Dehmer, UMIT Dean Fennell, CCRCB
Galina Glazko, Rochester U Peter Hamilton, CCRCB Dennis McCance, CCRCB
Ken Mills, CCRCB Arcady Mushegian, Stowers Inst. Andre Ribeiro, Tampere U
John Storey, Princeton U Kate Williamson, CCRCB Shu-Dong Zhang, CCRCB
Funding:
Frank Emmert-Streib Inference of Regulatory Networks 109
Summary and Literature
Frank Emmert-Streib Inference of Regulatory Networks 110
Summary and Literature
References I
[1] G. Altay and F. Emmert-Streib.
Inferring the conservative causal core of gene regulatory networks.
BMC Systems Biology, 4:132, 2010.
[2] G. Altay and F. Emmert-Streib.
Revealing differences in gene network inference algorithms on the network-level by ensemble methods.
Bioinformatics, 26(14):173844, 2010.
[3] G. Altay and F. Emmert-Streib.
Structural Inuence of gene networks on their inference: Analysis of C3NET.
Biology Direct, 6:31, 2011.
[4] M. Ashburner, C. Ball, J. Blake, D. Botstein, and H. Butler et al.
Gene ontology: tool for the unication of biology. The Gene Ontology Consortium.
Nature Genetics, 25(1):2529, May 2000.
[5] K. Basso, A. A. Margolin, G. Stolovitzky, U. Klein, R. Dalla-Favera, and A. Califano.
Reverse engineering of regulatory networks in human b cells.
Nature Genetics, 37(4):382390, 2005.
[6] A. Butte and I. Kohane.
Mutual information relevance networks: Functional genomic clustering using pairwise entropy measurements.
Pacic Symposioum on Biocomputing., 5:415426, 2000.
[7] A. Butte, P. Tamayo, D. Slonim, T. Golub, and I. Kohane.
Discovering functional relationships between rna expression and chemotherapeutic susceptibility using relevance networks.
Proc Natl Acad Sci U S A, 97(22):121826, 2000.
Frank Emmert-Streib Inference of Regulatory Networks 111
Summary and Literature
References II
[8] G. Chen, P. Larsen, E. Almasri, and Y. Dai.
Rank-based edge reconstruction for scale-free genetic regulatory networks.
BMC Bioinformatics, 9:75, 2008.
[9] D. Chickering.
Learning from Data: Articial Intelligence and Statistics V, chapter Learning Bayesian networks is NP-complete, pages
121130.
Springer, 1996.
[10] S. Datta and S. Datta.
Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes.
BMC Bioinformatics, 7(1):397, 2006.
[11] A. de la Fuente, N. Bing, I. Hoeschele, and P. Mendes.
Discovery of meaningful associations in genomic data using partial correlation coefcients.
Bioinformatics, 20(18):35653574, 2004.
[12] R. de Matos Simoes and F. Emmert-Streib.
Bagging statistical network inference from large-scale gene expression data.
PLoS ONE, 7(3):e33624, 2012.
[13] G. Di Camillo, B. andToffolo and C. Cobelli.
A gene network simulator to assess reverse engineering algorithms.
Ann N Y Acad Sci., 1158:12542, 2009.
[14] F. Emmert-Streib, G. Glazko, G. Altay, and R. de Matos Simoes.
Statistical inference and reverse engineering of gene regulatory networks from observational expression data.
Frontiers in Genetics, 3:8, 2012.
Frank Emmert-Streib Inference of Regulatory Networks 112
Summary and Literature
References III
[15] J. J. Faith, B. Hayete, J. T. Thaden, I. Mogno, and J. Wierzbowski et al.
Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression
Proles.
PLoS Biol, 5, 01 2007.
[16] T. Fawcett.
An introduction to ROC analysis.
Pattern Recognition Letters, 27:861874, 2006.
[17] S. Gama-Castro, V. Jimenez-Jacinto, M. Peralta-Gil, A. Santos-Zavaleta, M. I. Penaloza-Spinola, B. Contreras-Moreira,
J. Segura-Salazar, L. Muniz-Rascado, I. Martinez-Flores, H. Salgado, C. Bonavides-Martinez, C. Abreu-Goodger,
C. Rodriguez-Penagos, J. Miranda-Rios, E. Morett, E. Merino, A. M. Huerta, L. Trevino-Quintanilla, and J. Collado-Vides.
RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental)
annotated promoters and Textpresso navigation.
Nucl. Acids Res., 36(1):D120124, 2008.
[18] M. Grzegorczyk, D. Husmeier, K. Edwards, P. Ghazal, and A. Millar.
Modelling non-stationary gene regulatory processes with a non-homogeneous bayesian network and the allocation
sampler.
Bioinformatics, 24(8):20718, 2008.
[19] B. M. Haynes BC.
Benchmarking regulatory network reconstruction with grendel.
Bioinformatics, 25(6):8017, 2009.
[20] M. Kanehisa and S. Goto.
KEGG: Kyoto Encyclopedia of Genes and Genomes.
Nucleic Acids Res., 28:2730, 2000.
Frank Emmert-Streib Inference of Regulatory Networks 113
Summary and Literature
References IV
[21] K. Liang and X. Wang.
Gene regulatory network reconstruction using conditional mutual information.
EURASIP J Bioinform Syst Biol., 2008:253894, 2008.
[22] D. Marbach, T. Schaffter, C. Mattiussi, and D. Floreano.
Generating Realistic In Silico Gene Networks for Performance Assessment of Reverse Engineering Methods.
Journal of Computational Biology, 16(2), 2009.
[23] A. Margolin, I. Nemenman, K. Basso, C. Wiggins, and G. Stolovitzky et al.
ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context.
BMC Bioinformatics, 7:S7, 2006.
[24] H. W. Mewes, D. Frishman, U. Guldener, G. Mannhaupt, and K. Mayer et al.
MIPS: a database for genomes and protein sequences.
Nucl. Acids Res., 30(1):3134, 2002.
[25] P. Meyer, F. Latte, and G. Bontempi.
minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information.
BMC Bioinformatics, 9(1):461, 2008.
[26] A. Ribeiro and J. Lloyd-Price.
Sgn sim, a stochastic genetic networks simulator.
Bioinformatics, 23(6):777, 2007.
[27] J. Schfer and K. Strimmer.
An empirical bayes approach to inferring large-scale gene association networks.
Bioinformatics, 21:754764, 2005.
Frank Emmert-Streib Inference of Regulatory Networks 114
Summary and Literature
References V
[28] B. Shipley.
Cause and Correlation in Biology.
Cambridge University Press, Cambridge; New York, 2000.
[29] N. Soranzo, G. Bianconi, and C. Altani.
Comparing association network algorithms for reverse engineering of large-scale gene regulatory networks: synthetic
versus real data.
Bioinformatics, 23(13):16407, 2007.
[30] T. Van den Bulcke, K. Van Leemput, B. Naudts, P. van Remortel, H. Ma, A. Verschoren, B. De Moor, and K. Marchal.
SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms.
BMC Bioinformatics, 7(1):43, 2006.
[31] J. Watkinson, K. Liang, X. Wang, T. Zheng, and D. Anastassiou.
Inference of regulatory gene interactions from expression data using three-way mutual information.
Ann N Y Acad Sci., 1158:30213, 2009.
[32] C. Wolfe, I. Kohane, and A. Butte.
Systematic survey reveals general applicability of "guilt-by-association" within gene coexpression networks.
BMC Bioinformatics, 6(1):227, 2005.
[33] M. Xiong, J. Li, and X. Fang.
Identication of genetic networks.
Genetics, 166:10371052, 2004.
[34] B. Zhang and S. Horvath.
A general framework for weighted gene co-expression network analysis.
Stat Appl Genet Mol Biol., 4:17, 2005.
Frank Emmert-Streib Inference of Regulatory Networks 115
Summary and Literature
References VI
[35] W. Zhao, E. Serpendin, and E. R. Dougherty.
Inferring connectivity of genetic regulatory networks using information-theoretic criteria.
IEEE/ACM Transactions on Computational Bioloyg and Bioinformatics, 5(2):262274, 2008.
Frank Emmert-Streib Inference of Regulatory Networks 116

You might also like