Professional Documents
Culture Documents
V12 GRN Boolean Network
V12 GRN Boolean Network
Boolean Networks
Gene-Regulatory Network of E. coli
Boolean Networks
- toy systems
- hematopoiesis
Bioinformatics 3 – WS 22/23 V 12 – 1
In lecture V9, we talked about the binding of transcription factors to DNA. Lecture
V10 discussed gene expression. In lecture V11, we looked at differential expression
analysis of RNAseq data and then turned to translation of peptide chains. Now is time
to put things together into the form of so-called gene-regulatory networks. They
comprise of all TFs, microRNAs and regulated target genes of an entire organism or of
a subsystem.
1
Gene Expression
Sequence of processes: from DNA to functional proteins
degraded
nucleus cytosol mRNA
transcription microRNAs
transport degradation
transcribed
DNA mRNA mRNA
RNA translation
In eukaryotes: Bound to RNA-
TFs RNA processing: binding proteins protein
capping, splicing (RBPs)
post-
translational
modifications
→ regulation at every step!!! active
protein
Ubiquitination
mark
degraded
protein
Bioinformatics 3 – WS 22/23 V 12 – 2
This is the central dogma of molecular biology again, where some of the regulatory
components were added. In the 1950s, nobody knew about microRNAs. A hot topic is
currently the binding of RBPs (RNA-binding proteins) to mRNAs.
2
What is a GRN?
Gene regulatory networks (GRN) are model representations
of how genes regulate the expression levels of each other.
GRNs may either possess 2 ingredients (TFs and target genes) or 3 ingredients (TFs,
microRNAs and target genes). We will talk about microRNAs in detail in lecture V16.
For the moment, it is sufficient to mention that microRNAs generally repress mRNA
expression by binding to their 3‘-UTR region.
3
Layers upon Layers
Biological regulation
<=> Projected regulatory network
via proteins and metabolites
activation
<=> repression
self-
repression
Bioinformatics 3 – WS 22/23 V 12 – 4
In principle, there exist several layers of molecular networks. Gene expression may be
regulated at different levels (left figure). E.g. gene 1 may encode TF 1 which then
negatively regulates gene 2. Gene 2 may encode the enzyme protein 2 that converts
metabolite 1 into metabolite 2. This may lead to activation of the expression of gene
4. Gene 4 codes for protein 4 that forms a PP complex with protein 3 and may then
activate gene 2. This is a hypothetical example, of course.
To simplify things, we project all these effects onto the layer of genes (right figure).
For example, the inhibitory arrow from gene 1 to gene 2 must be interpreted via the
encoded TF 1 etc. Genes do not bind to the promoters of other genes.
4
Global Regulators in E. coli
5
Simple organisms have hierarchical GRNs
Lowest level: operons that code for TFs with only auto-
Largest weakly connected component regulation, or no TFs
(WCC)
Next layer: delete nodes of lower layer, identify TFs that do not
(ignore directions of regulation): 325 regulate other operons in this layer (only lower layers)
operons Continue …
(3/4 of the complete network)
Shown on the left is the GRN as obtained e.g. from force-directed layout. Shown on
the right is a different visualization that emphasizes the hierarchical organization.
Global regulators have no incoming edges. They regulate other genes „top down“.
6
E.coli GRN modules
When the top 3 layers are removed, there is a clear modular structure. Weakly
connected components (connected by undirected edges) are colored in one color.
7
Putting it back together
The 10 global
regulators are at the
core of the network,
some hierarchies exist
between the modules
This is another view of the full GRN where the global regulators are placed at the
center. Functional modules in the transcriptional regulatory network of E.
coli . Operons in different modules are shown in different colors. The ten global
regulators form the core part of the network. The periphery modules are connected
mainly through the global regulators.
8
Modules have specific functions
The 10 global regulators regulate 33 separate „modules“ that each consist of several
operons.
9
Frequency of co-regulation Martinez-Antonio,
Collado-Vides,
Half of all target genes are regulated by multiple TFs.
Curr Opin Microbiol 6,
In most cases, a „gobal“ regulator (with > 10 interactions) 482 (2003)
works together with a more specific local regulator.
10
Bioinformatics 3 – WS 22/23 V 12 –
10
TF regulatory network in E.coli
When more than one TF
regulates a gene, the order of
their binding sites is as given
in the figure.
Bioinformatics 3 – WS 22/23 V 12 – 11
11
Response to changes in environmental
conditions
TFs sense changes in environmental conditions or other changes that encode
internal signals.
Bioinformatics 3 – WS 22/23 12
V 12 –
As mentioned in lecture V9, 75% of the TFs are 2-domain proteins, one DNA-binding
domain and one regulatory domain. TFs sense changes in environmental conditions
or other internal signals encoding changes. Bacteria constantly monitor extracellular
physicochemical conditions, so that they can respond by modifying their gene
expression patterns to adjust growth. The link between changes in environmental
conditions and changes in transcriptional regulation involves signal-transduction
pathways. The precise link with regulatory proteins is achieved by small metabolites.
Half of the known and predicted transcriptional regulators in E. coli have predicted
domains for the binding of small metabolites. The role of both local and global
regulators is to mediate precise activation or repression by sensing changes in specific
metabolites. The only difference is that global TFs sense a larger number of growth
conditions.
12
Story: Quorum sensing of Vibrio fischeri
V. fischeri has a microbial symbiotic relationship with the squid Euprymna
scolopes.
The bacterium exists in small amounts in the ocean (102 cells/ml) and in large
amount in the light organs of the squid (1010 cells/ml).
The light organ of the squid provides to the bacteria all the nutrients
that they need to survive.
The squid benefits from the bacteria's quorum sensing and bioluminescence
abilities.
https://www.bio.cmu.edu/courses/03441/TermPapers/99TermPapers/Quorum/vibrio_fischeri.html
Bioinformatics 3 – WS 22/23 V 12 – 13
13
Quorum sensing of Vibrio fischeri
The cell density-dependent control of gene expression is activated by a
transcriptional activator protein that is coupled to a signal molecule
(autoinducer).
The autoinducer is released by the bacteria into its surrounding environment and
taken up from there.
The Hawaiian
bobtail squid,
Euprymna
scolopes.
During the day, the squid keeps the bacteria at lower concentrations by expelling
some of them into the ocean during regular intervals.
https://www.bio.cmu.edu/courses/03441/TermPapers/99TermPapers/Quorum/vibrio_fischeri.html
www.wikipedia.org
Bioinformatics 3 – WS 22/23 V 12 – 14
Wikipedia:
„Aliivibrio fischeri (also called Vibrio fischeri) is a Gram-negative, rod-shaped
bacterium found globally in marine environments. This species has bioluminescent
properties, and is found predominantly in symbiosis with various marine animals,
such as the Hawaiian bobtail squid. It is named after Bernhard Fischer, a German
microbiologist. “
14
Vibrio fischeri helps with Camouflage
This is perfect for the squid because it is a night feeder.
In the moonlight, the swimming squid would normally cast a shadow beneath
itself making it a perfect target for squid-eating organisms.
https://www.bio.cmu.edu/courses/03441/TermPapers/99TermPapers/Quorum/vibrio_fischeri.html
Image from quantamagazine.org
Bioinformatics 3 – WS 22/23 V 12 – 15
Here, you can read more about the symbiotic relationship of bobtails and this
bacterium: https://www.biointeractive.org/sites/default/files/BobtailSquid-Educator-
film.pdf
15
Quorum sensing of Vibrio fischeri
AI
LuxR LuxR
LuxA
LuxI LuxB
LuxA
LuxR LuxB
luxR luxICDABE
Bioinformatics 3 – WS 22/23 V 12 – 16
16
Boolean Networks
Dependencies between variables can be formulated as conditional transitions
Bioinformatics 3 – WS 22/23 V 12 – 17
In a Boolean network, variables have discrete states. In the simplest models, they can
only have the states on/off or 1/0.
The dependencies between variables can result in state changes.
17
Boolean Networks II
State of the system: described by vector of discrete values
Si = {0, 1, 1, 0, 0, 1, …}
Propagation:
Si+1 = {x1(i+1), x2(i+1), x3(i+1), …}
x1(i+1) = f1(x1(i), x2(i), x3(i), …) with fi given by condition tables
Bioinformatics 3 – WS 22/23 V 12 – 18
A simulation will start from a particular initial condition, where all variables are set
either to 1 or 0. The state of the system is then described by a vector holding the
values of all variables.
18
A Small Example
State vector S = {A, B, C} → 8 possible states
Conditional evolution:
A is on if C is on A activates B C is on if (B is on && A is off)
Ai+1 Ci Bi+1 Ai Ci+1 Ai Bi
0 0 0 0 0 0 0
1 1 1 1 1 0 1
0 1 0
0 1 1
Start from {A, B, C} = {1, 0, 0}
assume here that
# Si A B C
inhibition through A
0 S0 1 0 0 is stronger than
1 S1 0 1 0 activation via B
Bioinformatics 3 – WS 22/23 V 12 – 19
Let‘s consider the regulatory relationships in the top right picture. 3 variables can
each be in 2 states (0/1) → there are 23 = 8 states.
The three blue matrices in the second row are condition tables describing the state of
the left variable in iteration i+1 as a function of the right variables at iteration i.
One of the conditions is „C activates A“. Thus, if Ci is off, A will also be off in the
next iteration. If Ci is on, A will also be on in the next iteration.
The green matrix at the bottom describes the time evolution starting from one
particular iteration. When we start from 1/0/0, the system oscillates and visits two
other states before it returns to the same initial state. Since the Boolean network is
deterministic, the second cycle would be identical to the first one. Thus, from 8
possible states we have already visited 3 states.
19
Test the Other Starting Conditions
Test the other states Ai+1 Ci Bi+1 Ai Ci+1 Ai Bi
0 0 0 0 0 0 0
# A B C 1 1 1 1 1 0 1
0 1 1 1 0 1 0
1 1 1 0 0 1 1
# A B C
2 0 1 0
0 1 0 1
3 0 0 1 # A B C
1 1 1 0
4 1 0 0 0 0 1 1
5 0 1 0 1 1 0 1
Bioinformatics 3 – WS 22/23 V 12 – 20
Here, we also study what happens when we start from the 5 other states. If we start
from 1/1/1, the system converges to the same cycle that we saw on the previous
slide. The 0/0/0 state does not oscillate. The other two states also transition to the
same periodic cycle.
20
A Knock-out Mutant
Ai+1 Ci Bi+1 Ai Ci+1 Bi
0 0 0 0 0 0
1 1 1 1 1 1
Attractors:
# A B C # A B C # A B C
0 1 0 0 0 1 1 0 0 1 1 1
1 0 1 0 1 0 1 1 1 1 1 1
2 0 0 1 2 1 0 1
3 1 0 0 3 1 1 0 # A B C
0 0 0 0
no feedback 1 0 0 0
→ no stabilization, network just "rotates"
Bioinformatics 3 – WS 22/23 V 12 – 21
If we remove the regulatory feedback, the condition tables are all identical. Now,
there are 4 possible cycles.
21
Boolean Network of QS
Minimum set of species:
AI
luxR luxICDABE
0 0 0 0
1 1 1 1
How does LuxI depend on How does LuxR:AI:Genome depend on
LuxR:AI:Genome? LuxR:AI?
Bioinformatics 3 – WS 22/23 V 12 – 22
Now we come back to the Vibrio fischeri example. The aim is always to setup the
most simple Boolean network. Therefore, we omit the light producing proteins
LuxA/B because they are read from the same operon as LuxI. Thus, their
concentration can be assumed to be proportional to that of LuxI.
In our model, we don‘t represent the unbinding (dissociation) of LuxR:AI from the
genome. In principle, this binding reaction would be reversible, which would make
the model more realistic. The idea is always to simplify all steps.
22
Condition Tables for QS II
LuxR LuxR AI LuxR:AI:Genome
AI
LuxI
1 1 0 0
LuxR
1 0 1 0
1 1 1 0
luxR luxICDABE
0 0 0 1
1 1 0 1Comment: LuxR present, no AI available
0 0 1 1
0 1 1 1LuxR present, binds AI in next step,
no LuxR is produced because
LuxR:AI:Genome inhibits LuxR production
LuxR:AI LuxR AI LuxR:AI:Genome
0 0 0 0
0 1 0 0
0 0 1 0
1 1 1 0
0 0 0 1
0 1 0 1
0 0 1 1
1 1 1 1
Bioinformatics 3 – WS 22/23 V 12 – 23
These are the condition tables for LuxR and the complex of LuxR with autoinducer
(LuxR:AI). When LuxR and AI are both present, the LuxR:AI complex forms in the next
step.
23
Condition tables for QS III
AI
LuxR LuxR
LuxI
LuxR
luxR luxICDABE
AI LuxR AI LuxI
0 0 0 0
0 1 0 0 AI LuxR AI LuxI
1 0 1 0 1 x x 1
0 1 1 0 → 0 x 0 0
1 0 0 1 1 0 1 0
1 1 0 1 0 1 1 0
1 0 1 1
1 1 1 1
Bioinformatics 3 – WS 22/23 V 12 – 24
The condition table for AI can be simplified. If LuxI is on, AI is set on irrespective of
the states of LuxR and AI (first row in right table).
24
Scanning for Attractors
States of V. fischeri QS system are mapped onto integers
Bioinformatics 3 – WS 22/23 V 12 – 25
Periodic cyclic transitions are termed attractors. Their length (number of visited
states) is termed period. As basin of attraction we consider all initial states that
converge into one attractor. The simplied Vibrio fischeri system has 5 variables and
therefore 25 = 32 states.
If all variables are initially OFF, the production of LuxR is not inhibited, and LuxR will
be ON in the next iteration. Since no autoinducer exists, the system remains
indefinitely in this state. This is the first attractor. We will denote the states as binary
numbers that we read in reverse order. For example, the state 1/0/0/0/0 read as
binary number in reversed order (00001) corresponds to the binary number „1“.
25
Scanning for Attractors II
Attractor 2: orbit: 3, 9, 17, 5 → period 4
LR RA AI RAG LI
4/4 = 1 1/4 = 0.25 1/4 = 0.25 1/4 = 0.25 1/4 = 0.25
Bioinformatics 3 – WS 22/23 V 12 – 26
If we set RAG = ON, LuxI will be produced in the next iteration, followed by
production of LuxR and AI, then the complex RA forms, which then binds to the
genome (RAG), which again leads to synthesis of LuxI. From this state on, the system
remains in this attractor. Seven states lead to this attractor. Thus, its population is
7/32 = 21.9%.
In this periodic orbit, LuxR is always ON, the LuxR:AI complex is only formed in one
out of 4 states (state 3) etc. Thus one can compute the listed averaged occupancies.
26
Attractors III
Attractor 3: period 4, basin of 16 states → 50 %
# LR RA AI RAG LI
. X X . .
. X X X .
. . X X X
. . X . X
Bioinformatics 3 – WS 22/23 V 12 – 27
27
Classifying the Attractors
→ Interpret the system's behavior from the properties of the attractors
1 1 6.25 % (2) 1 0 0 0 0
Bioinformatics 3 – WS 22/23 V 12 – 28
Altogether, there are 5 attractors. We can characterize them by their brightness (state
of LuxI). These 3 states correspond to the luminescent states of Vibrio fischeri in its
symbiosis with the bobtail squid.
28
The Feed-Forward-Loop
Signal propagation
External signal determines state of X Left column: external signal
→ response Z for short and long signals X X Y Z
0 0 0
1 0 0
Short
condition tables: 0 1 0 Signal
Y X Z X Y 0 0 0
0 0 1 0 0
0 0 0 Long
1 1 0
X Y Z
Y X Z X Y 0 1 0
1 0 0 0 0 1 1 0
0 1 0 0 1 0 0 0
1 1 0 0 1 0
1 1 0
0 1 1
1 0 0
1 0 1
0 0 1
0 1 1
Bioinformatics 3 – WS 22/23 0 1 0 V 12 – 29
This is an example how a feed-forward loop (top left system) can determine whether
an activating signal has a sufficient length. A short activation (setting X = ON for only
1 step, see top right example) is not transmitted to Z. Only when X is activated long
enough, Z is turned ON. The same thing happens in the lower example where
activations are replaced by repressions. In this case, the response to the long signal
has exactly the same length.
29
Can Boolean Networks be predictive?
Generally: → quality of the results depends on the quality of the model
→ quality of the model depends on the quality of the assumptions
Bioinformatics 3 – WS 22/23 V 12 – 30
This is a reflexion on the severe assumptions that we need to make in order to apply
Boolean networks. Can one use them to describe realistic systems?
30
Understand Blood development (hemato-
poeisis) with the help of Boolean Networks
Blood development represents one of the earliest
stages of organogenesis. The production of primitive
erythrocytes is required to support the growing embryo.
These cells initially have the potential to form either Moignard et al.,
blood, endothelium and smooth muscle cells. Nature Biotech.
33, 269 (2015)
Bioinformatics 3 – WS 22/23 V 12 – 31
31
Early stages of hematopoesis
In this study, cells were flow sorted into single Flk1+ cells at E7.0 (primitive
streak, PS), E7.5 (neural plate, NP) and E7.75 (head fold, HF) stages.
Bioinformatics 3 – WS 22/23 V 12 – 32
E7.0 is day 7 in mouse development. The top right figure shows the embryo at E7.5,
see
https://embryology.med.unsw.edu.au/embryology/index.php/Mouse_Timeline_Detai
led#Day_7_.28E7.0.29
At E7.5, early blood formation starts. Until then, the embryo was purely fed from the
mother.
Cell sorting is done based on FACS analysis.
32
Studied cells
Cells were sorted from multiple
embryos at each time point.
3,934 cells were analyzed in total.
Bioinformatics 3 – WS 22/23 V 12 – 33
For the initial stages, more embryos were needed because they are smaller and have
fewer cells (bottom right picture)
33
Assay gene expression in single cells
Gene expression in single cells
assayed with PCR for:
- 33 transcription factors known to be
involved in endothelial and
hematopoietic development
- 9 marker genes (needed for FACS-
sorting)
- 4 house-keeping genes (needed
for quality checks and normalization)
Cells were discarded that did not
express all 4 house-keeping genes,
or for which their expression was
more than 3 standard deviations
from the mean.
www.fluidigm.com
Moignard et al.,
Nature Biotech.
33, 269 (2015)
Bioinformatics 3 – WS 22/23 V 12 – 34
In 2015, single-cell sequencing was still very new. As an alternative, the authors
monitored the expression levels of 46 genes by qRT-PCR, 33 of them TFs known to be
important for endothelial and hematopoietic development.
34
Hierarchical clustering of gene expression data
3 main clusters:
Cell differentiation
progresses
asynchronously
On the y-axis are the 33 TFs and the 9 marker genes. On the x-axis are all analyzed
single cells.
Hierarchical clustering using Spearman rank correlation and complete linkage yielded
3 main clusters shown at the top (labeled I, II and III).
35
Dimensionality reduction: diffusion maps
Similarity of expression x in cells i and j :
These are so-called diffusion maps to illustrate the developmental progression along
three principal components. The idea is that cells „diffuse“ along a developmental
trajectory and may bifurcate at developmental branching points (here: state HF). The
matrix P measures the similarity of expression between pairs of cells by the so-called
„diffusion metric“ defined by the formula.
ε can be interpreted as time in a diffusional process.
36
Who regulates hematopoiesis?
Design Boolean Network
Note that very few of the possible states have been observed.
Moignard et al.,
Nature Biotech.
33, 269 (2015)
Bioinformatics 3 – WS 22/23 V 12 – 37
The 33 TFs have 8.5 billion possible states, but only about 4000 states were measured
as single cells. Thus, one cannot consider all possible states. A way out is to only
consider the occupied states and transitions between them.
37
State graph of largest connected comp.
Moignard et al.,
Nature Biotech.
33, 269 (2015)
State graph (largest connected component) of 1448 states reaching all 5 stages.
Add edges to connect all those pairs of states that differ in the on/off levels of a
single gene (and are identical otherwise), see right side with labeled edges.
This network represents the largest connected component of 1448 cells / states. All 5
stages (PS, NP, HF, 4SG, 4SFG-) are populated. All those states are connected by edges
that differ only in the expression level of 1 TF (ON/OFF).
38
Automatic derivation of rules for
Boolean Network
We are given:
Moignard et al.,
Nature Biotech.
33, 269 (2015)
Bioinformatics 3 – WS 22/23 V 12 – 39
The authors applied a logical scheme termed SAT to determine automatically from
the data the logical rules of the observed state transitions.
39
Optimality criteria for rules
The rule synthesis method searches for an update function ui:{0,1}n→{0,1}
for each variable vi∈V, such that the following conditions hold:
1. For each edge (s1,s2) labeled with variable vi in the orientated graph,
the update function for vi takes state s1 to state s2: ui(s1) = s2(i).
Moignard et al.,
Nature Biotech.
33, 269 (2015)
Bioinformatics 3 – WS 22/23 V 12 – 40
This is a so-called re-engineering problem. We know the states that were visited
during the developmental progression, but we don‘t know the logic which TF
regulates which other TF. These rules need to be derived from the data.
40
Allowed complexity of the rules
The update function ui is restricted to have the form:
f 1 ¬f2
where fj is a Boolean formula that has
and-nodes of in-degree two,
or-nodes of arbitrary in-degree, and
where f1 has a maximum depth of Ni and f2 has a maximum depth of Mi.
The search for edge orientations and associated Boolean update rules is
encoded as a Boolean satisfiability (SAT) problem.
Moignard et al.,
Nature Biotech.
33, 269 (2015)
Bioinformatics 3 – WS 22/23 V 12 – 41
We will not go into the issue of solving SAT problems here. This is quite involved, see
e.g. http://cse.unl.edu/~choueiry/Documents/SATSolvers-KR-book-draft-07.pdf
41
Generated rules for Boolean Network
Additional validity check of
the postulated rules:
Moignard et al.,
Nature Biotech.
33, 269 (2015)
Bioinformatics 3 – WS 22/23 V 12 – 42
The table lists in column 2 the synthetically generated update functions (condition
tables). The entries of column 3 (% non-observed transitions disallowed) likely mean
that the generated update functions violate almost no observed transitions. The right
column checks if the target gene listed in column 1 contains in its promoter a binding
motif for the TFs present in the update function.
42
Core network controlling hematopoiesis
Bioinformatics 3 – WS 22/23 V 12 – 43
From the full network, the authors extracted a core network of 20 TFs with
endothelial and blood-associated gene modules centered on Sox7, Hoxb4 and Erg,
and on Gata1 and Spi1 (PU.1), respectively, see blue dashed ellipsoids.
43
Predict effects of perturbations as validation
Authors simulated overexpression and knockout experiments for each TF.
Assessed ability of the network to reach wildtype or new stable states.
Bioinformatics 3 – WS 22/23 V 12 – 44
How could one validate the correctness of the Boolean Network and its predictivity?
The authors performed additional simulations mimicking the overexpression or
knockout of each TF separately, see table S5. Overexpression of Sox7 was predicted
to prevent the formation of blood-like states.
44
Control experiments Moignard et al.,
Nature Biotech.
(b) Colony assays with or without doxycycline 33, 269 (2015)
from genotyped E8.25 embryos from
iSox7+rtTA+ mice crossed with wild types.
Bioinformatics 3 – WS 22/23 V 12 – 45
This is a control experiment with transgenic mice where the expression of the Sox7
gene is put under the control of applying the chemical doxycycline. In the embryonic
cells collected from these mice, Sox7 can be over-expressed. Then, only half of the
normal amount of erythroid cells are produced. This is as expected since the embryo
is a cross of a wild-type mother with a transgenic father.
45
Conclusions
Cells destined to become blood and endothelium arise at all stages of the
analyzed time course rather than in a synchronized fashion at one precise
time point. This is consistent with the gradual nature of gastrulation.
Moignard et al.,
Nature Biotech.
33, 269 (2015)
Bioinformatics 3 – WS 22/23 V 12 – 46
46