Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

V12 – Gene Regulatory Networks,

Boolean Networks
Gene-Regulatory Network of E. coli
Boolean Networks
- toy systems
- hematopoiesis

Thu, 8. Dez. 2022

Bioinformatics 3 – WS 22/23 V 12 – 1

In lecture V9, we talked about the binding of transcription factors to DNA. Lecture
V10 discussed gene expression. In lecture V11, we looked at differential expression
analysis of RNAseq data and then turned to translation of peptide chains. Now is time
to put things together into the form of so-called gene-regulatory networks. They
comprise of all TFs, microRNAs and regulated target genes of an entire organism or of
a subsystem.

1
Gene Expression
Sequence of processes: from DNA to functional proteins
degraded
nucleus cytosol mRNA

transcription microRNAs
transport degradation
transcribed
DNA mRNA mRNA
RNA translation
In eukaryotes: Bound to RNA-
TFs RNA processing: binding proteins protein
capping, splicing (RBPs)
post-
translational
modifications
→ regulation at every step!!! active
protein
Ubiquitination
mark

degraded
protein
Bioinformatics 3 – WS 22/23 V 12 – 2

This is the central dogma of molecular biology again, where some of the regulatory
components were added. In the 1950s, nobody knew about microRNAs. A hot topic is
currently the binding of RBPs (RNA-binding proteins) to mRNAs.

2
What is a GRN?
Gene regulatory networks (GRN) are model representations
of how genes regulate the expression levels of each other.

In transcriptional regulation, proteins called transcription factors (TFs) regulate the


transcription of their target genes to produce
messenger RNA (mRNA).

In post-transcriptional regulation, microRNAs (miRNAs)


cause degradation and repression of target mRNAs.

These interactions are represented in a GRN by adding


edges linking TF or miRNA genes to their target mRNAs.

Narang et al. (2015). PLoS Comput Biol 11(9): e1004504


Bioinformatics 3 – WS 22/23 V 12 – 3

GRNs may either possess 2 ingredients (TFs and target genes) or 3 ingredients (TFs,
microRNAs and target genes). We will talk about microRNAs in detail in lecture V16.
For the moment, it is sufficient to mention that microRNAs generally repress mRNA
expression by binding to their 3‘-UTR region.

3
Layers upon Layers
Biological regulation
<=> Projected regulatory network
via proteins and metabolites

activation

<=> repression
self-
repression

Note that genes do not interact directly

Gene regulation networks have "cause and action"


→ directed networks
A gene can enhance or suppress the expression of another gene
→ two types of arrows

Bioinformatics 3 – WS 22/23 V 12 – 4

In principle, there exist several layers of molecular networks. Gene expression may be
regulated at different levels (left figure). E.g. gene 1 may encode TF 1 which then
negatively regulates gene 2. Gene 2 may encode the enzyme protein 2 that converts
metabolite 1 into metabolite 2. This may lead to activation of the expression of gene
4. Gene 4 codes for protein 4 that forms a PP complex with protein 3 and may then
activate gene 2. This is a hypothetical example, of course.
To simplify things, we project all these effects onto the layer of genes (right figure).
For example, the inhibitory arrow from gene 1 to gene 2 must be interpreted via the
encoded TF 1 etc. Genes do not bind to the promoters of other genes.

4
Global Regulators in E. coli

Ma et al., BMC Bioinformatics 5 (2004) 199


Bioinformatics 3 – WS 22/23 V 12 – 5

This is a link to this paper:


https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-5-199
The GRN of E.coli is one of the best studied organisms.
Shown here are 10 global regulators (left column) and the number of target genes /
operons regulated by them. Note that in prokaryotes, multiple genes are organized
into operons that are under the control of a common promoter. The „function“ listed
in the right column is the dominant function of the global regulator. Alternatively, one
could also list the functions of the regulated target genes.

5
Simple organisms have hierarchical GRNs
Lowest level: operons that code for TFs with only auto-
Largest weakly connected component regulation, or no TFs
(WCC)
Next layer: delete nodes of lower layer, identify TFs that do not
(ignore directions of regulation): 325 regulate other operons in this layer (only lower layers)
operons Continue …
(3/4 of the complete network)

Network from standard Network with all regulatory



layout algorithm edges pointing downwards

→ a few global regulators (•) control all the details


Ma et al., BMC Bioinformatics 5 (2004) 199
Bioinformatics 3 – WS 22/23 V 12 – 6

Shown on the left is the GRN as obtained e.g. from force-directed layout. Shown on
the right is a different visualization that emphasizes the hierarchical organization.
Global regulators have no incoming edges. They regulate other genes „top down“.

6
E.coli GRN modules

Remove top 3 layers and determine WCCs


→ just a few modules

Ma et al., BMC Bioinformatics 5 (2004) 199


Bioinformatics 3 – WS 22/23 V 12 – 7

When the top 3 layers are removed, there is a clear modular structure. Weakly
connected components (connected by undirected edges) are colored in one color.

7
Putting it back together

The 10 global
regulators are at the
core of the network,
some hierarchies exist
between the modules

Ma et al., BMC Bioinformatics 5 (2004) 199


Bioinformatics 3 – WS 22/23 V 12 – 8

This is another view of the full GRN where the global regulators are placed at the
center. Functional modules in the transcriptional regulatory network of E.
coli . Operons in different modules are shown in different colors. The ten global
regulators form the core part of the network. The periphery modules are connected
mainly through the global regulators.

8
Modules have specific functions

Ma et al., BMC Bioinformatics 5 (2004) 199


Bioinformatics 3 – WS 22/23 V 12 – 9

The 10 global regulators regulate 33 separate „modules“ that each consist of several
operons.

9
Frequency of co-regulation Martinez-Antonio,
Collado-Vides,
Half of all target genes are regulated by multiple TFs.
Curr Opin Microbiol 6,
In most cases, a „gobal“ regulator (with > 10 interactions) 482 (2003)
works together with a more specific local regulator.

10
Bioinformatics 3 – WS 22/23 V 12 –

This is a link to this paper:


https://www.sciencedirect.com/science/article/abs/pii/S1369527403001139?via%3D
ihub
The group of Julio Collado-Vides / Mexico has compiled since 1998 the GRN of E.coli
strain K12, see http://regulondb.ccg.unam.mx/
CRP (cyclic AMP receptor protein or catabolite activator protein) is the most
important TF, followed by IHF (integration host factor) and FNR (fumarate and nitrate
reductase). Global regulators often act together with other TFs, so-called co-
regulators. Also, they regulator other TFs. Sigma factors are those RNA polymerase-
recruiting factors that control the target genes of the global regulators.

10
TF regulatory network in E.coli
When more than one TF
regulates a gene, the order of
their binding sites is as given
in the figure.

Arrowheads and horizontal


bars indicate positive /
negative regulation when the
position of the binding site is
known.

In cases where only the


nature of regulation is
known, without binding site
information, + and – are used
The names of global regulators are in bold.
to indicate positive and
negative regulation.
Babu, Teichmann, Nucl. Acid Res. 31, 1234 (2003)

Bioinformatics 3 – WS 22/23 V 12 – 11

This is the link to this paper:


https://academic.oup.com/nar/article/31/4/1234/2375876
This visualization of the GRN of E.coli resembles the layout of an electronic chip. For
38 of the TFs, some of the regulated genes are themselves TFs. This figure shows the
network of TFs currently known to regulate each other in E.coli . CRP controls 18
different TFs apart from itself. Two other TFs, FNR and ArcA, regulate four TFs and FIS
and IHF (himA and himD) regulate three TFs. CRP is a global sensor of food levels in
the environment. FNR and ArcA are involved in sensing the redox status of the cell to
regulate genes involved in respiration under aerobic and anaerobic conditions. FIS
and IHF-like are general enhancers, which frequently act together with other TFs to
regulate genes. Thus the TFs involved in respiration and growth are those which
regulate the most TF.

11
Response to changes in environmental
conditions
TFs sense changes in environmental conditions or other changes that encode
internal signals.

Global environment growth conditions in which TFs are regulating.


# in brackets indicates how many additional TFs participate in the same
number of conditions.

Martinez-Antonio, Collado-Vides, Curr Opin Microbiol 6, 482 (2003)

Bioinformatics 3 – WS 22/23 12
V 12 –

As mentioned in lecture V9, 75% of the TFs are 2-domain proteins, one DNA-binding
domain and one regulatory domain. TFs sense changes in environmental conditions
or other internal signals encoding changes. Bacteria constantly monitor extracellular
physicochemical conditions, so that they can respond by modifying their gene
expression patterns to adjust growth. The link between changes in environmental
conditions and changes in transcriptional regulation involves signal-transduction
pathways. The precise link with regulatory proteins is achieved by small metabolites.
Half of the known and predicted transcriptional regulators in E. coli have predicted
domains for the binding of small metabolites. The role of both local and global
regulators is to mediate precise activation or repression by sensing changes in specific
metabolites. The only difference is that global TFs sense a larger number of growth
conditions.

12
Story: Quorum sensing of Vibrio fischeri
V. fischeri has a microbial symbiotic relationship with the squid Euprymna
scolopes.
The bacterium exists in small amounts in the ocean (102 cells/ml) and in large
amount in the light organs of the squid (1010 cells/ml).

At low concentrations, V. fischeri does not produce luminescence.

At high cell density these bacteria emit a blue-green light.

The light organ of the squid provides to the bacteria all the nutrients
that they need to survive.

The squid benefits from the bacteria's quorum sensing and bioluminescence
abilities.

https://www.bio.cmu.edu/courses/03441/TermPapers/99TermPapers/Quorum/vibrio_fischeri.html
Bioinformatics 3 – WS 22/23 V 12 – 13

Vibrio fischerii is a bioluminescent bacterium in the ocean. It is widely used to test


the toxicity of chemicals. We will discuss an application where we modeled the
expression of the light-producing proteins by a Boolean network.

13
Quorum sensing of Vibrio fischeri
The cell density-dependent control of gene expression is activated by a
transcriptional activator protein that is coupled to a signal molecule
(autoinducer).

The autoinducer is released by the bacteria into its surrounding environment and
taken up from there.
The Hawaiian
bobtail squid,
Euprymna
scolopes.

During the day, the squid keeps the bacteria at lower concentrations by expelling
some of them into the ocean during regular intervals.

At night however, the bacteria are allowed to accumulate to about


1010 cells/ml so that they will emit blue-green light.

https://www.bio.cmu.edu/courses/03441/TermPapers/99TermPapers/Quorum/vibrio_fischeri.html
www.wikipedia.org

Bioinformatics 3 – WS 22/23 V 12 – 14

Wikipedia:
„Aliivibrio fischeri (also called Vibrio fischeri) is a Gram-negative, rod-shaped
bacterium found globally in marine environments. This species has bioluminescent
properties, and is found predominantly in symbiosis with various marine animals,
such as the Hawaiian bobtail squid. It is named after Bernhard Fischer, a German
microbiologist. “

14
Vibrio fischeri helps with Camouflage
This is perfect for the squid because it is a night feeder.

In the moonlight, the swimming squid would normally cast a shadow beneath
itself making it a perfect target for squid-eating organisms.

However, the bacterial glow will counter the


shadowing effect the moon makes and mask the
squid from its predators.

In the morning, the squid expels some bacteria


into the ocean to a concentration where they will
not generate light anymore so as to conserve
energy.

https://www.bio.cmu.edu/courses/03441/TermPapers/99TermPapers/Quorum/vibrio_fischeri.html
Image from quantamagazine.org
Bioinformatics 3 – WS 22/23 V 12 – 15

Here, you can read more about the symbiotic relationship of bobtails and this
bacterium: https://www.biointeractive.org/sites/default/files/BobtailSquid-Educator-
film.pdf

15
Quorum sensing of Vibrio fischeri

AI

LuxR LuxR
LuxA
LuxI LuxB
LuxA
LuxR LuxB

luxR luxICDABE

Bioinformatics 3 – WS 22/23 V 12 – 16

Top left: bioluminescence is generated by the complex of LuxA:LuxB proteins. They


are read from the LuxICDABE operon. The luxI protein produces the small
autoinducer chemicals (red hexagons). These can pass the cell membrane and build
up a concentration gradient. The luxR transcriptional regulator LuxR is activated by
binding of AI.
Bottom right: This schema illustrates the same regulatory principles as the top left
picture. It specifies two different autoinducer molecules (blue and green). The green
one is a homo-serine lactone (HSL) that binds to LuxR. In our Boolean network, we
will focus on the right half and neglect the left half.

16
Boolean Networks
Dependencies between variables can be formulated as conditional transitions

• "If LuxI is present, then AI will be produced…"


• "If there is AI and there's no LuxR:AI bound to the genome, then LuxR
will be expressed and complexes can form…"
• "If LuxR:AI is bound to the genome, then LuxI is expressed…"

Simplified mathematical description of the dependencies:

Densities of the species <=> discrete states: on/off, 1/0


Network of dependencies <=> condition tables
Progress in time <=> discrete propagation steps

Bioinformatics 3 – WS 22/23 V 12 – 17

In a Boolean network, variables have discrete states. In the simplest models, they can
only have the states on/off or 1/0.
The dependencies between variables can result in state changes.

17
Boolean Networks II
State of the system: described by vector of discrete values
Si = {0, 1, 1, 0, 0, 1, …}

Si = {x1(i), x2(i), x3(i), …}

fixed number of species with finite number of states each


→ finite number of system states
→ periodic trajectories
→ periodic sequence of states = attractor
→ all states leading to an attractor = basin of attraction

Propagation:
Si+1 = {x1(i+1), x2(i+1), x3(i+1), …}
x1(i+1) = f1(x1(i), x2(i), x3(i), …) with fi given by condition tables

Bioinformatics 3 – WS 22/23 V 12 – 18

A simulation will start from a particular initial condition, where all variables are set
either to 1 or 0. The state of the system is then described by a vector holding the
values of all variables.

18
A Small Example
State vector S = {A, B, C} → 8 possible states

Conditional evolution:
A is on if C is on A activates B C is on if (B is on && A is off)
Ai+1 Ci Bi+1 Ai Ci+1 Ai Bi
0 0 0 0 0 0 0
1 1 1 1 1 0 1
0 1 0
0 1 1
Start from {A, B, C} = {1, 0, 0}
assume here that
# Si A B C
inhibition through A
0 S0 1 0 0 is stronger than
1 S1 0 1 0 activation via B

2 S2 0 0 1 periodic orbit of length 3


3 S3 = S0 1 0 0

Bioinformatics 3 – WS 22/23 V 12 – 19

Let‘s consider the regulatory relationships in the top right picture. 3 variables can
each be in 2 states (0/1) → there are 23 = 8 states.
The three blue matrices in the second row are condition tables describing the state of
the left variable in iteration i+1 as a function of the right variables at iteration i.
One of the conditions is „C activates A“. Thus, if Ci is off, A will also be off in the
next iteration. If Ci is on, A will also be on in the next iteration.
The green matrix at the bottom describes the time evolution starting from one
particular iteration. When we start from 1/0/0, the system oscillates and visits two
other states before it returns to the same initial state. Since the Boolean network is
deterministic, the second cycle would be identical to the first one. Thus, from 8
possible states we have already visited 3 states.

19
Test the Other Starting Conditions
Test the other states Ai+1 Ci Bi+1 Ai Ci+1 Ai Bi
0 0 0 0 0 0 0
# A B C 1 1 1 1 1 0 1
0 1 1 1 0 1 0
1 1 1 0 0 1 1
# A B C
2 0 1 0
0 1 0 1
3 0 0 1 # A B C
1 1 1 0
4 1 0 0 0 0 1 1
5 0 1 0 1 1 0 1

Same attractor as before:


# A B C
100 → 010 → 001 → 100
0 0 0 0
is also reached from: 1 0 0 0
110, 111, 101, 011
→ Either all off or stable oscillations

Bioinformatics 3 – WS 22/23 V 12 – 20

Here, we also study what happens when we start from the 5 other states. If we start
from 1/1/1, the system converges to the same cycle that we saw on the previous
slide. The 0/0/0 state does not oscillate. The other two states also transition to the
same periodic cycle.

20
A Knock-out Mutant
Ai+1 Ci Bi+1 Ai Ci+1 Bi
0 0 0 0 0 0
1 1 1 1 1 1

Attractors:

# A B C # A B C # A B C
0 1 0 0 0 1 1 0 0 1 1 1
1 0 1 0 1 0 1 1 1 1 1 1
2 0 0 1 2 1 0 1
3 1 0 0 3 1 1 0 # A B C
0 0 0 0
no feedback 1 0 0 0
→ no stabilization, network just "rotates"

Bioinformatics 3 – WS 22/23 V 12 – 21

If we remove the regulatory feedback, the condition tables are all identical. Now,
there are 4 possible cycles.

21
Boolean Network of QS
Minimum set of species:
AI

LuxR, AI, LuxR:AI, LuxR:AI:genome, LuxI


LuxR LuxR LuxA
LuxI LuxB

Here: Light signal (LuxAB) α LuxI


LuxA
LuxR LuxB

luxR luxICDABE

Condition tables: describe the state of a species in the next step


given the current states of all relevant species.

LuxI LuxR:AI:Genome LuxR:AI:Genome LuxR:AI

0 0 0 0
1 1 1 1
How does LuxI depend on How does LuxR:AI:Genome depend on
LuxR:AI:Genome? LuxR:AI?

Bioinformatics 3 – WS 22/23 V 12 – 22

Now we come back to the Vibrio fischeri example. The aim is always to setup the
most simple Boolean network. Therefore, we omit the light producing proteins
LuxA/B because they are read from the same operon as LuxI. Thus, their
concentration can be assumed to be proportional to that of LuxI.
In our model, we don‘t represent the unbinding (dissociation) of LuxR:AI from the
genome. In principle, this binding reaction would be reversible, which would make
the model more realistic. The idea is always to simplify all steps.

22
Condition Tables for QS II
LuxR LuxR AI LuxR:AI:Genome
AI

1 0 0 0 When LuxR:AI:Genome is empty,


LuxR is produced in next step
LuxR LuxR

LuxI
1 1 0 0
LuxR
1 0 1 0
1 1 1 0
luxR luxICDABE
0 0 0 1
1 1 0 1Comment: LuxR present, no AI available
0 0 1 1
0 1 1 1LuxR present, binds AI in next step,
no LuxR is produced because
LuxR:AI:Genome inhibits LuxR production
LuxR:AI LuxR AI LuxR:AI:Genome
0 0 0 0
0 1 0 0
0 0 1 0
1 1 1 0
0 0 0 1
0 1 0 1
0 0 1 1
1 1 1 1

Bioinformatics 3 – WS 22/23 V 12 – 23

These are the condition tables for LuxR and the complex of LuxR with autoinducer
(LuxR:AI). When LuxR and AI are both present, the LuxR:AI complex forms in the next
step.

23
Condition tables for QS III

AI

LuxR LuxR

LuxI

LuxR

luxR luxICDABE

AI LuxR AI LuxI
0 0 0 0
0 1 0 0 AI LuxR AI LuxI
1 0 1 0 1 x x 1
0 1 1 0 → 0 x 0 0
1 0 0 1 1 0 1 0
1 1 0 1 0 1 1 0
1 0 1 1
1 1 1 1

Bioinformatics 3 – WS 22/23 V 12 – 24

The condition table for AI can be simplified. If LuxI is on, AI is set on irrespective of
the states of LuxR and AI (first row in right table).

24
Scanning for Attractors
States of V. fischeri QS system are mapped onto integers

{LuxR (LR), LuxR:AI (RA), AI, LuxR:AI:Genome (RAG), LuxI (LI)}


= {1, 2, 4, 8, 16} - current state can be interpreted as binary number!

For each attractor:


• periodic orbit and its length (period)
• basin of attraction and its relative size (32 states in total)
→ how likely will the system end up in each of the attractors?

Attractor 1: orbit: 1 → period 1


states: 0, 1 → size 2, 2/32 = 6.25 %

start from state 0: # LR RA AI RAG LI - state


0 . . . . . - 0
1 X . . . . - 1 <= attractor
2 X . . . . - 1 States: named by reading occupancies as binary
numbers in reversed order.

Bioinformatics 3 – WS 22/23 V 12 – 25

Periodic cyclic transitions are termed attractors. Their length (number of visited
states) is termed period. As basin of attraction we consider all initial states that
converge into one attractor. The simplied Vibrio fischeri system has 5 variables and
therefore 25 = 32 states.
If all variables are initially OFF, the production of LuxR is not inhibited, and LuxR will
be ON in the next iteration. Since no autoinducer exists, the system remains
indefinitely in this state. This is the first attractor. We will denote the states as binary
numbers that we read in reverse order. For example, the state 1/0/0/0/0 read as
binary number in reversed order (00001) corresponds to the binary number „1“.

25
Scanning for Attractors II
Attractor 2: orbit: 3, 9, 17, 5 → period 4

Its basin of attraction


includes states: 2, 3, 5, 8, 9, 16, 17 → size 7, 21.9 %
of all states
start from state 8: # LR RA AI RAG LI - state
0 . . . X . - 8
1 . . . . X - 16
2 X . X . . - 5
3 X X . . . - 3 Periodic attractor:
4 X . . X . - 9
5 X . . . X - 17
state 17 returns to 5
6 X . X . . - 5

averaged occupancies in this periodic orbit:

LR RA AI RAG LI
4/4 = 1 1/4 = 0.25 1/4 = 0.25 1/4 = 0.25 1/4 = 0.25

Bioinformatics 3 – WS 22/23 V 12 – 26

If we set RAG = ON, LuxI will be produced in the next iteration, followed by
production of LuxR and AI, then the complex RA forms, which then binds to the
genome (RAG), which again leads to synthesis of LuxI. From this state on, the system
remains in this attractor. Seven states lead to this attractor. Thus, its population is
7/32 = 21.9%.
In this periodic orbit, LuxR is always ON, the LuxR:AI complex is only formed in one
out of 4 states (state 3) etc. Thus one can compute the listed averaged occupancies.

26
Attractors III
Attractor 3: period 4, basin of 16 states → 50 %
# LR RA AI RAG LI
. X X . .
. X X X .
. . X X X
. . X . X

Attractor 4: period 4, basin of 4 states → 12.5 %


# LR RA AI RAG LI
X X X . .
X X . X .
X . . X X
X . X . X

Attractor 5: period 2, basin of 3 states → 9.4 %


# LR RA AI RAG LI
X . X X .
. X . X

Bioinformatics 3 – WS 22/23 V 12 – 27

These are 3 further attractors.

27
Classifying the Attractors
→ Interpret the system's behavior from the properties of the attractors

Attractor period basin size <LuxR> <LuxR:AI> <AI> <LuxR:AI:Gen> <LuxI>

1 1 6.25 % (2) 1 0 0 0 0

2 4 21.9% (7) 1 0.25 0.25 0.25 0.25

3 4 50 % (16) 0 0.5 1 0.5 0.5


4 4 12.5 % (4) 1 0.5 0.5 0.5 0.5
5 2 9.4% (3) 0.5 0.5 0.5 0.5 0.5

There exist three regimes:

dark: LuxI = 0 intermediate: LuxI = 0.25 bright: LuxI = 0.5


little free LuxR (0.24) +
free LuxR, no AI free LuxR + little AI
much AI (0.85)

Bioinformatics 3 – WS 22/23 V 12 – 28

Altogether, there are 5 attractors. We can characterize them by their brightness (state
of LuxI). These 3 states correspond to the luminescent states of Vibrio fischeri in its
symbiosis with the bobtail squid.

28
The Feed-Forward-Loop
Signal propagation
External signal determines state of X Left column: external signal
→ response Z for short and long signals X X Y Z
0 0 0
1 0 0
Short
condition tables: 0 1 0 Signal
Y X Z X Y 0 0 0
0 0 1 0 0
0 0 0 Long
1 1 0

Response to signal X(t)


1 1 0 0 1
1 1 1 signal
0 1 0 0 1 1
1 1 1 0 0 0
0 0 0

X Y Z
Y X Z X Y 0 1 0
1 0 0 0 0 1 1 0
0 1 0 0 1 0 0 0
1 1 0 0 1 0
1 1 0
0 1 1
1 0 0
1 0 1
0 0 1
0 1 1
Bioinformatics 3 – WS 22/23 0 1 0 V 12 – 29

This is an example how a feed-forward loop (top left system) can determine whether
an activating signal has a sufficient length. A short activation (setting X = ON for only
1 step, see top right example) is not transmitted to Z. Only when X is activated long
enough, Z is turned ON. The same thing happens in the lower example where
activations are replaced by repressions. In this case, the response to the long signal
has exactly the same length.

29
Can Boolean Networks be predictive?
Generally: → quality of the results depends on the quality of the model
→ quality of the model depends on the quality of the assumptions

Assumptions for the Boolean network description:

(• subset of the species considered → reduced system state space)

• only discrete density levels → dynamic balances lost,


reduced to oscillations
• conditional yes–no causality → no continuous processes

• discretized propagation steps → timing of concurrent paths?

"You get what you pay for"

Bioinformatics 3 – WS 22/23 V 12 – 30

This is a reflexion on the severe assumptions that we need to make in order to apply
Boolean networks. Can one use them to describe realistic systems?

30
Understand Blood development (hemato-
poeisis) with the help of Boolean Networks
Blood development represents one of the earliest
stages of organogenesis. The production of primitive
erythrocytes is required to support the growing embryo.

Blood has long served as a model to study organ


development owing to the accessibility of blood cells and
the availability of markers for specific cell populations.
Flk1 and Runx1 staining
Blood development is initiated at gastrulation from in E7.5 mesoderm and
blood band, respectively
multipotent Flk1+ mesodermal cells
(Flk1+ is a marker gene for this developmental stage.)

These cells initially have the potential to form either Moignard et al.,
blood, endothelium and smooth muscle cells. Nature Biotech.
33, 269 (2015)

Bioinformatics 3 – WS 22/23 V 12 – 31

This is a link to this article:


https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4374163/
The experimental work was done in the group of Bertie Göttgens/Cambridge. The
bioinformatics was done by the groups of Jasmin Fischer and Fabian Theis.

31
Early stages of hematopoesis

E7.5 early bud


https://embryology.
med.unsw.edu.au/
The first wave of primitive hematopoiesis originates from Flk1+ mesoderm,
with all hematopoietic potential in the mouse contained within
the Flk1+ population from E7.0 onwards.

In this study, cells were flow sorted into single Flk1+ cells at E7.0 (primitive
streak, PS), E7.5 (neural plate, NP) and E7.75 (head fold, HF) stages.

E8.25 cells were subdivided into putative blood and endothelial


populations by isolating GFP+ cells (four somite, 4SG) and Flk1+GFP−
cells (4SFG−), respectively

Bioinformatics 3 – WS 22/23 V 12 – 32

E7.0 is day 7 in mouse development. The top right figure shows the embryo at E7.5,
see
https://embryology.med.unsw.edu.au/embryology/index.php/Mouse_Timeline_Detai
led#Day_7_.28E7.0.29
At E7.5, early blood formation starts. Until then, the embryo was purely fed from the
mother.
Cell sorting is done based on FACS analysis.

32
Studied cells
Cells were sorted from multiple
embryos at each time point.
3,934 cells were analyzed in total.

Total cell numbers and numbers of


cells of different stages present in
each embryo were estimated from
fluorescence-activated cell sorting
(FACS) data.

Number of cells/embryo grows as


Moignard et al., embryonic development progresses.
Nature Biotech.
33, 269 (2015)

Bioinformatics 3 – WS 22/23 V 12 – 33

For the initial stages, more embryos were needed because they are smaller and have
fewer cells (bottom right picture)

33
Assay gene expression in single cells
Gene expression in single cells
assayed with PCR for:
- 33 transcription factors known to be
involved in endothelial and
hematopoietic development
- 9 marker genes (needed for FACS-
sorting)
- 4 house-keeping genes (needed
for quality checks and normalization)
Cells were discarded that did not
express all 4 house-keeping genes,
or for which their expression was
more than 3 standard deviations
from the mean.

www.fluidigm.com
Moignard et al.,
Nature Biotech.
33, 269 (2015)

Bioinformatics 3 – WS 22/23 V 12 – 34

In 2015, single-cell sequencing was still very new. As an alternative, the authors
monitored the expression levels of 46 genes by qRT-PCR, 33 of them TFs known to be
important for endothelial and hematopoietic development.

34
Hierarchical clustering of gene expression data
3 main clusters:

Cluster I (right side)


contains mostly PS and
Color code NP cells (green/blue)

Cluster III contains


exclusively 4SG cells (red)

Cluster II (left side) is


mixed (NF, 4SFG- , …)

 Cell differentiation
progresses
asynchronously

Moignard et al., ← Single cells →


Nature Biotech.
33, 269 (2015)
35
Bioinformatics 3 – WS 22/23 V 12 – 35

On the y-axis are the 33 TFs and the 9 marker genes. On the x-axis are all analyzed
single cells.
Hierarchical clustering using Spearman rank correlation and complete linkage yielded
3 main clusters shown at the top (labeled I, II and III).

35
Dimensionality reduction: diffusion maps
Similarity of expression x in cells i and j :

P(i,j) is normalized so that

The cells are organized in 2D or 3D such


that the Euclidean distance between the
cells corresponds to the diffusion metric P(i,j)
.
The quantity P(i,j) can then be interpreted as
the transition probability of a diffusion
process between cells.

Axes: eigenvectors of matrix P with largest


eigenvalues.
Moignard et al.,
Nature Biotech.
Bioinformatics 3 – WS 22/23 33, 269 (2015) V 12 – 36

These are so-called diffusion maps to illustrate the developmental progression along
three principal components. The idea is that cells „diffuse“ along a developmental
trajectory and may bifurcate at developmental branching points (here: state HF). The
matrix P measures the similarity of expression between pairs of cells by the so-called
„diffusion metric“ defined by the formula.
ε can be interpreted as time in a diffusional process.

36
Who regulates hematopoiesis?
Design Boolean Network

Determine suitable expression thresholds for each gene to categorize its


expression levels into binary on / off states.

Note that very few of the possible states have been observed.

Moignard et al.,
Nature Biotech.
33, 269 (2015)

Bioinformatics 3 – WS 22/23 V 12 – 37

The 33 TFs have 8.5 billion possible states, but only about 4000 states were measured
as single cells. Thus, one cannot consider all possible states. A way out is to only
consider the occupied states and transitions between them.

37
State graph of largest connected comp.

Moignard et al.,
Nature Biotech.
33, 269 (2015)

State graph (largest connected component) of 1448 states reaching all 5 stages.

Add edges to connect all those pairs of states that differ in the on/off levels of a
single gene (and are identical otherwise), see right side with labeled edges.

Idea behind this: these transitions can be best interpreted.


Bioinformatics 3 – WS 22/23 V 12 – 38

This network represents the largest connected component of 1448 cells / states. All 5
stages (PS, NP, HF, 4SG, 4SFG-) are populated. All those states are connected by edges
that differ only in the expression level of 1 TF (ON/OFF).

38
Automatic derivation of rules for
Boolean Network
We are given:

- a set of variables V, corresponding to genes,

- an undirected graph G = (N,E)


where each node n ∈ N is labeled with a state s:V→{0,1}, and
each edge {s1,s2} ∈ E is labeled with the single variable
that changes between state s1 and s2.

We are also given a designated set I  N of initial vertices


and a designated set F  N of final vertices,
along with a threshold ti for each variable vi ∈ V.

Moignard et al.,
Nature Biotech.
33, 269 (2015)

Bioinformatics 3 – WS 22/23 V 12 – 39

The authors applied a logical scheme termed SAT to determine automatically from
the data the logical rules of the observed state transitions.

39
Optimality criteria for rules
The rule synthesis method searches for an update function ui:{0,1}n→{0,1}
for each variable vi∈V, such that the following conditions hold:

1. For each edge (s1,s2) labeled with variable vi in the orientated graph,
the update function for vi takes state s1 to state s2: ui(s1) = s2(i).

2. The number of states is maximized in which no transitions induced by


the update functions are missing.

3. Every final vertex f ∈ F is reachable from some initial vertex i ∈ I by a


directed path in the orientated graph.

Moignard et al.,
Nature Biotech.
33, 269 (2015)

Bioinformatics 3 – WS 22/23 V 12 – 40

This is a so-called re-engineering problem. We know the states that were visited
during the developmental progression, but we don‘t know the logic which TF
regulates which other TF. These rules need to be derived from the data.

40
Allowed complexity of the rules
The update function ui is restricted to have the form:
f 1  ¬f2
where fj is a Boolean formula that has
and-nodes of in-degree two,
or-nodes of arbitrary in-degree, and
where f1 has a maximum depth of Ni and f2 has a maximum depth of Mi.

Ni and Mi are given as parameters to the method.

The search for edge orientations and associated Boolean update rules is
encoded as a Boolean satisfiability (SAT) problem.

Moignard et al.,
Nature Biotech.
33, 269 (2015)

Bioinformatics 3 – WS 22/23 V 12 – 41

We will not go into the issue of solving SAT problems here. This is quite involved, see
e.g. http://cse.unl.edu/~choueiry/Documents/SATSolvers-KR-book-draft-07.pdf

41
Generated rules for Boolean Network
Additional validity check of
the postulated rules:

check whether regulated


genes contain TF-binding
motifs in their promoters
(right column).

This is the case for 70% of


the rules.

Moignard et al.,
Nature Biotech.
33, 269 (2015)

Bioinformatics 3 – WS 22/23 V 12 – 42

The table lists in column 2 the synthetically generated update functions (condition
tables). The entries of column 3 (% non-observed transitions disallowed) likely mean
that the generated update functions violate almost no observed transitions. The right
column checks if the target gene listed in column 1 contains in its promoter a binding
motif for the TFs present in the update function.

42
Core network controlling hematopoiesis

Derived core network of 20 TFs.

Red edges: activation Moignard et al.,


Blue edges: repression Nature Biotech.
33, 269 (2015)

Bioinformatics 3 – WS 22/23 V 12 – 43

From the full network, the authors extracted a core network of 20 TFs with
endothelial and blood-associated gene modules centered on Sox7, Hoxb4 and Erg,
and on Gata1 and Spi1 (PU.1), respectively, see blue dashed ellipsoids.

43
Predict effects of perturbations as validation
Authors simulated overexpression and knockout experiments for each TF.
Assessed ability of the network to reach wildtype or new stable states.

Red : gene expressed;


blue : gene not expressed.
S2-S6: blood-like
S7: endothelial-like
S8 : no activity

Network stable states for wt and Sox7 overexpression.

Enforced expression of Sox7 (that is normally downregulated) stabilized the


endothelial module and disabled the ability to reach any of the blood-like states.

Sox7 is predicted to regulate more targets than any other TF,


suggesting that perturbing its expression could have
Moignard et al.,
important downstream consequences. Nature Biotech.
33, 269 (2015)

Bioinformatics 3 – WS 22/23 V 12 – 44

How could one validate the correctness of the Boolean Network and its predictivity?
The authors performed additional simulations mimicking the overexpression or
knockout of each TF separately, see table S5. Overexpression of Sox7 was predicted
to prevent the formation of blood-like states.

44
Control experiments Moignard et al.,
Nature Biotech.
(b) Colony assays with or without doxycycline 33, 269 (2015)
from genotyped E8.25 embryos from
iSox7+rtTA+ mice crossed with wild types.

(c) Quantification of primitive erythroid


colonies after 4 days.

Embryos carrying both transgenes


(rtTA/iSox7) showed a 50% reduction of
primitive erythroid colony formation
following doxycycline-induced Sox7
expression compared to controls.

This suggests, in agreement with modeling


data and gene expression patterns, that
In iSox7-mouse, overexpression of Sox7 is
downregulation of Sox7 is important for stimulated by inducing the Sox7-promoter
the specification of primitive erythroid cells. by addition of the chemical doxycycline
(+Dox).

Bioinformatics 3 – WS 22/23 V 12 – 45

This is a control experiment with transgenic mice where the expression of the Sox7
gene is put under the control of applying the chemical doxycycline. In the embryonic
cells collected from these mice, Sox7 can be over-expressed. Then, only half of the
normal amount of erythroid cells are produced. This is as expected since the embryo
is a cross of a wild-type mother with a transgenic father.

45
Conclusions
Cells destined to become blood and endothelium arise at all stages of the
analyzed time course rather than in a synchronized fashion at one precise
time point. This is consistent with the gradual nature of gastrulation.

Using an automated Boolean Network synthesis toolkit, a core network of


20 highly connected TFs was identified which could reach 8 stable states
representing blood and endothelium.

The model predictions could be validated by demonstrating e.g. that Sox7


blocks primitive erythroid development.

→ Boolean Networks can be predictive and may guide experiments.

Moignard et al.,
Nature Biotech.
33, 269 (2015)

Bioinformatics 3 – WS 22/23 V 12 – 46

Today, we gave an introduction to gene-regulatory networks and discussed the


Boolean Network method that is frequently used to model the dynamic nature of
regulatory networks.

46

You might also like