Professional Documents
Culture Documents
Xóa 19
Xóa 19
igate this problem, the genome contains additional DNA elements, insulators, that block enhancer-
promoter interactions. However, there is an open problem with how the insulation works, especially
as enhancer-insulator pairs may be separated by millions of base pairs. Based on recent empirical
data from Hi-C experiments, this paper proposes a new mechanism that challenges the common
paradigm that rests on specific insulator-insulator interactions. Instead, this paper introduces a
stochastic looping model where enhancers bind weakly to surrounding chromatin. After calibrating
the model to experimental data, we use simulations to study the broad distribution of hitting times
between an enhancer and a promoter when there are blocking insulators. In some cases, there is
a large difference between average and most probable hitting times, making it difficult to assign a
typical time scale, hinting at highly defocused regulation times. We also map our computational
model onto a resetting problem that allows us to derive several analytical results. Besides offering
new insights into enhancer-insulator interactions, our paper advances the understanding of gene
regulatory networks and causal connections between genome folding and gene activation.
topological insulator model needs revision (at least in to Ea and Eb , there is an energy barrier ∆Gb separating
Drosophila melanogaster ). Building on this observation, the bound and associated states (called activation energy
our paper proposes an alternative mechanism where insu- in transition-state theory). Below (Sec. III A), we esti-
lators bind weakly to surrounding chromatin rather than mate these parameters from actual transcription factor
other insulators. We formulate our model on a lattice binding data.
with stochastic looping dynamics where we calibrate the In addition to surrounding chromatin, the enhancer
rates to existing experimental data and benchmark to also binds to insulators. We model this binding using
existing measured contacts from [14]. Next, we use our transition state theory but assign different values for Ea ,
model to study the dynamics of enhancer-promoter hit- Eb , and ∆Gb . Importantly, because the insulators are
ting times and show that the average may deviate sub- dynamic objects like the enhancer, they form loops with
stantially from the typical (most probable). We also surrounding chromatin (discussed below), meaning these
map our computational model onto a resetting problem energies change with site index j (and time). Therefore,
and provide several analytical results. Our work offers for some fixed insulator configurations, we express the
new insights into the enhancer-insulator mechanics. Be- unbinding and binding rates as
sides yielding a better understanding of gene regulatory
networks, knowing how insulators work may help un- ku (j) = γe−(∆Gb (j)−Eb (j)) ,
veil causal relationships connecting gene expression and (1)
kb (j) = γe−(∆Gb (j)−Ea (j)) ,
genome folding [15, 16].
where γ is the basal binding rate, on the order 107
s−1 [23, 24], and the thermal energy kB T is set to unity.
II. METHODS Since kb (j)/ku (j) = exp(Ea (j) − Eb (j)), these rates obey
detailed balance.
A. Looping model for enhancer-insulator dynamics Next, we discuss the enhancer’s looping rates. Simi-
lar to previous work [25–28], these rates depend on the
We represent the chromatin as an array of sites i = entropy cost of forming the loop kB ln[(ℓ/ℓ0 )−α ] where ℓ
1, . . . , N , where each site symbolizes a nucleosome (≈ is the loop length, α is the looping exponent, and ℓ0 is
175 bp), the basic chromatin unit. This choice is natural a typical loop scale [18]. To calibrate constants to obey
from a biological (or epigenetic) point of view [19–21] but typical DNA-looping times, we again use transition-state
not critical for our general framework. The array has four theory and add a small binding activation energy ∆Gl .
site types: enhancers, promoters, insulators, and regular Given these parameters, the looping on-rate from lattice
chromatin (Fig. 1). In most simulations, we consider one site i to j is
enhancer and one promoter, typically placed 20-50 array (
indices between one another (≈ 3.5 − 8.5 × 103 kb), and δe−α ln(ℓij /ℓ0 )−∆Gl if i ̸= j
up to twenty insulators. However, actual gene clusters kl (j|i) = (2)
0 i = j,
usually have a much richer arrangement where several
enhancers and insulators act in concert to ensure proper where δ is the basal looping frequency, which is on the
gene expression of many genes. But to better appreciate order of 103 s−1 [23], and ℓij = dnucl. × |i − j| is the loop
the model, we study simpler configurations. length; dnucl. is the nucleosome diameter (∼ 10 nm, or
As mentioned above, enhancers are DNA elements that 175 bp).
attract regulatory proteins, such as transcription fac- The looping off-rate follows a similar formula as Eq.
tors. While being attached to DNA, the transcription (1), but it contains the energy activation associated with
factors try to find the promoters associated with the the looped state ∆Gl instead of ∆Gb (j) since we imag-
designated promoter to regulate transcription. In other ine this is the only state from which the loop can break
words, protein-bound enhancers make repeated looping apart (it is also unphysical that the loop rate depends
attempts with surrounding chromatin until they reach on the position-dependent Ea (j)). Thus, to unloop from
the target site. the bound state (Eb (j)), the enhancer must first become
Our model treats the promoter as an absorbing point, ”associated” (Ea (j)) and then unloop. In summary, the
and the simulation stops once the enhancer complex looping off-rate is
reaches there (i.e., infinite reaction rate). This con-
trasts the interaction with surrounding chromatin which ko (j|i) = δe−(∆Gl −Ea (j)) (3)
is much weaker. We model enhancer-chromatin bind-
ing using standard transition-state theory where the en- where the two looping rates obeys detailed balance
hancer may be bound with energy Eb or loosely associ- kl (j|i)/ko (j|i) = exp(−Ea (j) − (−α ln(ℓij /ℓ0 ))).
ated (Ea ) from where it may detach (Fig. 1). These two Before showing how we calibrate the binding energies
states (”bound” and ”associated”) are similar to ”search” to experimental data, we make three comments. First,
and ”recognition” modes often used to represent two- even if the discussion above was mostly about enhancers,
state searchers to explain the so-called speed-stability we assume insulators to follow the same dynamics (Eqs.
paradox in DNA-target search problems [22]. In addition (1)-(3)), albeit with slightly different energy parameters.
3
Transcription factor
(b) (c)
Insulator
Promoter
Enhancer
Associated
150 bp
Energy
Bound
Energy seen by enhancers
Sites
FIG. 1. Schematics of enhancer-promoter-insulator constructs on DNA (a) and our three-state looping model (b)-(c). (a)
The insulator region blocks long-ranged enhancer-promoter interactions. Reducing such interactions causes ”insulation” as
the transcription factor cannot physically assist the transcription machinery (e.g., RNA polymerases) that interacts with the
promoter. (b) Three-state lattice model. We model the chromatin as a sequence of N sites. A few of these sites represent
enhancers (green) or insulators (orange). They can loop and associate to any other site with the rate kl (j|i). This rate balances
looping entropy and binding energy, where this energy changes with their time-varying relative positions. The line graph below
shows a snapshot of the energy landscape experienced by the enhancer. This landscape has two ”dips” corresponding to the
insulator loop anchor. The other sites represent ”inactive” chromatin. Unlike enhancers and insulators, we consider chromatin
as an ensemble of interacting loops and include them by adjusting the enhancer-promoter looping exponent [17, 18]. We color
the promoter in blue. In the model, we treat the promoter as an absorbing target. (c) Local binding dynamics. After looping,
the enhancer and insulator can either strongly associate (bind) to the new site by kb or dissociate back to its original position
by ko (similar to a resetting process). If bound, the return rate to the associated state is ku .
For example, we assume that insulators interact weekly The second one represents the other end of the insulator-
with chromatin. They do so with a binding constant chromatin loop. This loop is short-lived, so this second
that should not be smaller than K ∼ 102 µM, which position is highly dynamic and switches frequently and
is the typical scale for specific binding (estimated from symmetrically around the insulator’s primary position
E. coli [29]). While our model does not rest on specific during the simulation (Fig. 7). This effect implies that
insulator-insulator interactions, they are not excluded. the enhancer has two possibilities to bind the insulator
But they cannot be significantly larger than for general (Fig. 1(c)).
chromatin. If so, insulator pairs would appear as high- Second, thus far, we have discussed DNA looping,
contact stripes in Hi-C maps, which was not observed in omitting regular chromatin. But in reality, chromatin
[14]. Therefore, we let these interactions have the same also fluctuates. Instead of introducing specific looping
strength as regular insulator-chromatin interactions. rates akin to Eqs. (2) and (3), we treat chromatin as an
ensemble of interacting loops and modify the looping ex-
We also point out that the insulators in our simula- ponent α for enhancers and insulators accordingly. For
tions constantly form loops and, therefore, appear on two long self-avoiding chains, the exponent (often denoted
different lattice positions (Fig. 1(b)) from the point of ”ring factor”) is
view of the enhancer. One position is always fixed and
coincides with the insulator’s designated DNA segment. α ≈ dν − 2σ, (4)
4
Eb −10.0
A. Matching model parameters to data
(b)
1.2
To calibrate the model to empirical data, we estimate Insulator rep. 1
Ea (j), Eb (j), and ∆Gb (j) for binding and association,
(mutant - control)
we use comprehensive binding data for transcription fac- 0.8 Random rep. 2
tors and measured in vitro looping rates. We start with Simulations
0.6
association and binding.
0.4
Previous work calculated the energy (and free-energy)
landscape along DNA for individual transcription factors 0.2
(TFs) that bind to sequence-specific motifs, typically 10-
30 bp long [23, 31, 32]. But because DNA is so much 0.0
longer (and there are only four base pair types), there are −0.2
0 50 100 150 200 250 300
many instances of almost similar binding sequences. This
Distance across insulator [kb]
results in fluctuating genome-wide binding profiles inter-
preted as the TF’s spatial probability density pTF (x),
FIG. 2. Energy levels and fit to experimental data [14]. (a)
where x denotes the DNA coordinate. From this proba- The figure shows two typical interaction cases: high and low
bility density, it is common to define the energy profile energy (green and orange) We found that the interaction en-
E(x), assuming that pTF (x) ∼ exp(−E(x)). We defer to ergy is high for non-specific sites (grey sites in Fig. 1). This
SII and [23] for technical details about obtaining E(x) corresponds to a weak association energy (Ea ) and a large
from a given TF target sequence. In SII, we plot the barrier ∆G separating the bound (Eb ) and associated state.
average binding data for about 300 TFs that we use to This case differs from enhancers and insulators, where the in-
estimate Ea (j), Eb (j), and ∆Gb (j) for the enhancers and teraction energy is low (orange lines). Here, the association
insulators. In our lattice notation E(x) → E(j). and bound energy are both large, separated by a smaller en-
ergy barrier. (b). Recent experiments found that the contact
To estimate Ea (j), Eb (j), and ∆Gb (j) for the en- difference between DNA regions separated by insulator-rich
hancer, we use the formalism developed [23] for a two- segments increases if the insulators are compromised (i.e., ge-
state TF flipping between ”search” and ”recognition” netically engineer a mutant lacking the CP190 binding fac-
mode while searching for a DNA-target sequence. When tor that binds to insulators) [14]. We replot the empirical
in recognition mode, the TF is immobile, and the resi- data here (with permission) as dotted blue and purple lines.
dence time depends on how similar the local and target The shaded areas represent ± one standard deviation. To
sequences are to each other. In other words, the time match the data, we simulated a similar system with a center-
depends on the depth of the landscape E(j). When in placed insulator region and calculated the probability density
functions of each site flanking the insulator. The y-axis is
search mode, the TF diffuses and only weakly interacts
normalized by dividing by the maximum value of our simu-
with the DNA. In [23], it is assumed that the effective lations and the experimental data. By considering that each
energy landscape while diffusing is a scaled version of site is on the same scale as the Hi-C data used in experi-
E(j), i.e., ρE(j), where ρ is adjusted to agree with mea- ments (5kb), we found an excellent agreement between our
sured 1D diffusion constants. In practice, this means simulations and the data considering a stickiness between the
setting ρ ≤ 0.3; we use ρ = 0.3. Lastly, there is a free en- insulator-rich region and other regions (Esticky = −17) and
ergy barrier ∆GRS separating ”search” and ”recognition” fast looping dynamics (l0 = 0.4).
mode that we also extract from TF-binding data (see be-
low and in SII). In summary, the equations to calculate
5
Ea (j), Eb (j), and ∆Gb (j) read insulator pairs and could not detect any specific binding,
which contradicts the standard topological model.
Ea (j) = ρE(j),
To reproduce the measured contact decay in Fig. 2(b)
Eb (j) = ∆GRS + E(j), (5) using our model, we constructed a large-scale version
1+ρ where each lattice site matches the resolution of the data
∆Gb (j) = ∆GRS + E(j), (i.e., 5 kb), rather than a single nucleosome (∼ 0.2 kb).
2
In the middle of the lattice, we placed an insulator-rich
where all the parameters on the right-hand side come
region (one site) that weakly associates with all other
from empirical data. In particular, we used TF data
sites with the same energy (E(j) = const.) and forms
from the JASPAR database [33] to extract E(j) and
loops. To get the contact frequencies, we simulate re-
∆GRS . We apply these formulas to the three binding
peated looping events of all flanking sites across the in-
instances we have in our problem: enhancer-insulator,
sulator according to rates kl (i|j) and ko (Eqs. (2) and
insulator-insulator, and unspecific (enhancer-chromatin
(3)) and record the residence times between all site pairs.
and insulator-chromatin.) We used ∆GRS = 10.13 in all
Next, we collected these residence times over 102 simula-
three cases,
tions (≈ 106 time steps each) and used them as a proxy
Enhancer-insulator binding We calculated
for contact probabilities. To fit the model to the dotted
⟨E(j)⟩binding sites for all human chromosomes and
lines in Fig. 2(b), we calculated the relative difference in
available transcription factors from the JASPAR
these probabilities with and without the insulator (Fig.
database [33]. Using Eq. (5), we find that the popula-
2(b), black).
tion median (Md (·)) is Md (⟨E(j)⟩binding sites ) = −21.13
We note that our model agrees well with the empirical
(see SII.) This gives Ea (j) = −6.34, Eb (j) = −11.0, and
data using E(j) = −17 as unspecific binding for insula-
∆Gb (j) = −3.60, which is a system having low binding
tors and l0 = 0.4 ↔ 2000bp. This length scale agrees
energy, corresponding to a low association energy and
with typical coarse-grained ”beads-on-a-string” polymer
energy barrier (see Fig 2).
models for chromatin. For example, if modeled as a
Unspecific binding for enhancers. Here we set E(j) =
freely jointed chain, each monomer should contain 2000-
0. Plugging this into Eq. (5) gives Ea (j) = 0, Eb (j) =
25000 bp of DNA [37]. This value differs slightly from
10.13 and ∆Gb (j) = 10.13. This situation is where strong
our previous fitting using in vitro looping times, where
binding is rare and weak; see Fig 2.
l0 = 0.2 ↔ 1000 bp. In the remaining part of the pa-
Unspecific binding for insulators. Here we set E(j) =
per, we return to a nucleosome-centric model and use
−15.0. Plugging this into Eq. (5) gives Ea (j) = −4.5,
l0 = 0.24 and the unspecific binding E(j) = −15 as the
Eb (j) = −4.9 and ∆Gb (j) = 0.39.
standard settings (albeit we also study the effects of l0
Next, we match the looping rates for enhancers and
variations). But, in summary, this section shows that
insulators. To set l0 and ∆Gl , we use experimental data
our model can reproduce empirical data from Hi-C mea-
from [16, 34–36], reporting that typical looping times for
surements across an ensemble of insulators in Drosophila
300 bps long loops are 101 − 102 seconds (LacI protein
melanogaster.
and restriction enzymes NaeI and NarI), and 103 seconds
for 3500 bp long loops. Matching with our simulation
gives l0 ≈ 0.04 and ∆Gl = 0.0. However, the looping
times in vitro may deviate substantially in crowded cell C. Insulator densities strongly affect
conditions [16], where far-away sections come in contact enhancer-promoter hitting frequencies
faster than expected. To this end, we vary l0 to study
the fast- and slow-looping regimes. One of the paper’s key aims is to better understand the
hitting dynamics between the enhancer and promoter el-
ements under insulation, as such an encounter is the pri-
B. Sticky insulators reproduce measured contact mary step in transcription. In particular, we wish to cal-
differences in Drosophila melanogaster embryos culate the distribution of time interval lengths between
hitting events and study how they change with key vari-
In this section, we benchmark our model to empiri- ables such as insulator densities, positions and binding
cal data from [14]. Using Hi-C experiments, this paper strengths. To this end, we perform Gillespie simulations
quantifies how effectively insulators block 3D interactions using the rates outlined in Sec. II A. The simulations
between flanking DNA segment pairs. By collecting con- use a 200 lattice site system (≈ 35 kbp), where the en-
tact profiles for hundreds of insulator positions in mutant hancer and promoter reside in the middle, separated by
and wild-type Fruit fly embryos, the authors made two 30 sites in the standard setting (≈ 5 kbp) (Fig. 3(a)). On
key observations. First, the insulators block 3D interac- these 30 sites, we put an insulator region with length nins .
tions over distances up to 200 kb. We replot the empirical A typical simulation produces ≈ 105 samples, where we
data in Fig. 2(b) for two replicate experiments (Rep. 1 record the time ta for the enhancer to reach the promoter
and Rep. 2). The background is derived from the same site for the first time.
experiments but uses random loci far from any insula- We show several simulated histograms in Fig. 3(b)
tors. Second, they also measured 3D contacts between with varying insulator density (σins = nins /20 =
6
(a)
Low insulator density High insulator density
Goal distance
Insulator region
(b)
Density
10−5
10−4
10−5
10−5 10−6
10−6 10−6
10−7
10−7 10−7
10−8
10−8
10−3 10−1 101 103 10−2 100 102 104 10−2 10−1 100 101 102 103 104
First-hitting time ta First-hitting time ta First-hitting time ta
FIG. 3. Histograms of the first-hitting times for varying insulator density σins and looping scale l0 . (a) Schematics of enhancer-
insulator promoter configurations for two insulator densities (”low” and ”high”). The distance between the enhancer (left, blue)
and the promoter (green, right) is 30 sites. The insulators are placed in the middle (orange), where we increase the density
by extending the insulator region (from 0 to 20). We used a spacing of 5 sites between the insulators and the enhancer and
promoter to avoid effects simply due to proximity. (b) Log-binned histograms of the first-hitting times for different insulator
densities. We portray the histograms in two ways: area-normalized (bottom, ”Density”) and hitting-time probability (see Eq.
6) (top, ”Probability”). The colors and markers correspond to a varying amount of insulators (a). The colored bars above
show the 10th percentiles of all data points. The filled and shaded arrows above the lines indicate the mean (filled) and median
(shaded) first-hitting times, respectively. We observe a few peaks and valleys in the histograms, corresponding to different
search trajectories, such as the enhancer finding the promoter on the first try (potentially with a few intermediate fleeting
chromatin interactions) or getting sequestered by the insulator for a significant period.
0.0, 0.1, 0.5, 1.0) (see legend). We portray the histogram intuition for the weight of data points, where each shift
in two ways to highlight different features. The lower into a shaded color indicates 10 percentiles of the data.
panels show histograms ρ(ta ), where the area is normal- Lastly, we indicate the average and median hitting times
ized to unity (”Density”). The upper panels (”Probabil- by filled arrows above the P(ta ) curves, where the larger
ity”) show the enhancer’s probability P(ta ) of finding the arrow indicates the mean.
promoter within a specific time interval ∆t. We calculate
this probability from the density ρ as Consider the leftmost histogram column. Plotted in
logscale, we note that the hitting-time distributions are
Z ta +∆t
broad and that most data points follow the same trend
P(ta ) = ρ(u)du ≈ ρ(ta ) × ∆t, (6) until ta ∼ 10−1 and then spread apart. This means the
ta
short-time dynamics are insulator-independent. We in-
Above the histograms, we also show the accumulated terpret this as the enhancer loops into the target after
probability as colored stripes. These stripes provide an a few short-lived encounters with surrounding chromatin
7
First-hitting dissimilarity
corresponding to different types of search trajectories.
The left peaks are associated with enhancer-promoter
contact events not involving insulators (as explained
above). This contrasts with the right peaks, represent- 0.40
ing the most probable hitting time. These peaks shift
towards larger ta with growing insulator densities, mak-
ing the distributions orders of magnitude broader. 0.35
Apart from simulated data (symbols), the plots con-
tain several dashed lines with identical colors. These
lines closely follow the data points and represent a nu-
merical inverse Laplace transform of a theoretical search- 0.30
and-resetting model for ρ(ta ) (Eq. (17)), that we de- 0 0.25 0.5 0.75 1.0
rive in Sec. IV. In addition, two black dashed lines Insulator density σins
show local exponential fits for the two peaks. In par- (b)
ticular, the right black line follows the simple relation- 6 Insulator density σins
ship ρ(ta ) ∼ exp(−ta /τa ), where τa is the mean hitting- 5
time derived analytically (Eq. (19), Sec. IV). All dashed 0
4
Density H(χij)
curves show a good agreement between simulation and 0.1
the analytic theory. 3 0.5
Let’s consider all the histograms in Fig. 3(b). Each col- 2 1.0
umn shows the hitting-time distribution for varying loop-
ing scale l0 (values are indicated in the bottom right).
While this parameter is often interpreted as the Kuhn
length, it is also a proxy for chromatin compaction. As 1
discussed in [16], typical looping scales differ significantly
for different types of chromatin folding and density (e.g., 0 0.25 0.50 0.75 1.0
space-filling versus self-avoiding). By lowering l0 in our Uniformity index χij
model, thus mimicking less dense folding, we note that
the left peaks disappear and that the histograms be- FIG. 4. Heterogenous search trajectories. (a) Relative differ-
come insensitive to the insulator density. If l0 increases, ence between mean and median (see the inset equation) from
the peaks become more pronounced, indicating increas- Fig. 3)(b) (filled and shaded arrows). Note the non-monotonic
ing enhancer-insulator interactions due to less time spent growth and decay. (b) Uniformity index histograms (normal-
looping (and thus allowing for more possible contacts), ized), H(χij ) for varying insulator densities. The bimodal
resulting in greater insulation. shapes indicate heterogeneous enhancer-promoter search tra-
jectories.
Normalized mean fp τ
gion, possibly several times, before reaching the promoter 10−1
Slope
site. Because of the heterogeneous dynamics, it is diffi- l0 = 0.094
cult to assign a typical enhancer-promoter hitting time
scale describing the breadth of time scales. Finally, we
point out that this heterogeneity gets reinforced with un- E -8
successful binding enhancer-promoter binding attempts
when they must come together several times before form-
ing a stable complex. However, we do not explore this 10−2 -18
10−1 100
E. Insulation efficiency
Insulator density σins
The histograms in Fig. 2(b) show that the distribution (b)
of hitting times changes with insulator density σins and
looping scale l0 . But there is yet another critical vari- l0
able: the interaction energy E between insulators and 0.24
i0 r(N
|i 0)
r (1 |i
ω(i0 ,a) No insulator
)
0)
|i0
r(j|i0)
r (2
a
Absorbing
ω( Insulator
) target
ω(i0,j)
2)
,1 i0 ,
i0 ,
(i 0
}
N
ω
ω(
)
1 2 j N Associated
In equilibrium
Bound
FIG. 6. A schematic of the effective model that captures the general model in Sec. II A. All rates are shown for an enhancer
(or insulator) that starts at i0 , which jumps to some node j with a rate ω(i0 , j) and resets from it with the resetting rate
k(j|i0 ). Note that the resetting rate is a combination of rates, assuming equilibrium between the associated and bound states
(see Sec. II A and Supplementary S5.) At some of the sites, an insulator might be present (marked in orange), which affects
the resetting rate.
either return directly to i0 or become bound at i. Re- each other, thus creating a non-uniform binding land-
gardless of the process, the enhancer eventually returns scape. This manifests as a position-dependent resetting
to i0 like a resetting event, see the schematic in Fig. 6. rate r(j|i), which presents a more complex problem than
The difference between the two scenarios is that the res- Eq. (8) that cannot be solved with standard methods.
idence time is longer if bound, where the enhancer must To find r(j|i) as a function of the rates and energies
pass through the ”associated” state before it may reset to outlined in Sec. II A, we assume that the ”bound” and
site i0 . Without insulators, the resetting rate is the same ”associated” states are in equilibrium. This assumption
across all lattice sites r, and this system constitutes a gives (see derivation in SIII)
typical resetting problem studied by several authors [43–
47]. In our notation, the master equation for Pi (t) when ko (j|i) e−(∆Gl −Ea (j))
r(j|i) = kb (j|i)
=δ . (10)
the resetting rate is constant reads 1+ 1 + eEa (j)−Eb (j)
ku (j|i)
dPi (t) X
= ω(i, j)Pj (t) − rPi (t) It is essential to realize that this resetting rate depends
dt j critically on the insulators’ positions as they determine
(8)
X the binding landscape Eb (j) when forming loops with the
+ rPj (t)δi,i0 − ρa (t)δi,a ,
surrounding chromatin. We show below how we estimate
j̸=a
Eb (j) assuming the insulators’ spatial probability density
where we write the jumping rates as is in equilibrium. But first, we reformulate the master
P equation Eq. (8) with a position-dependent resetting rate
− k̸=i0 kl (k|j), j = i = i0
in an analytically solvable form.
ω(i, j) = kl (i|j), j = i0 (9) To this end, we decompose the resetting rate into two
0, parts
otherwise.
P r(j|i) = r∗ + ∆r(j|i), (11)
Here, the diagional element ω(i0 , i0 ) = − k̸=i0 kl (k|i0 )
represents all outgoing loops from the enhancer’s posi- where r∗ = minj (r(j|i)) is the smallest resetting rate,
tion i0 to anywhere on the lattice, and ω(i0 , j) = kl (j|i0 ) and redefine the looping matrix to
is the loop from i0 to j. In addition to ω(i, j) and r,
the master equation includes a sink term ρa (t)δi,a for ω(i, j) − ∆r(j|i0 ), j = i
the target, where ρa (t) is the first-hitting time distribu- ν(i, j) = ω(i, j) + ∆r(j|i0 ), i = i0 (12)
tion [48, 49]. Because of this sink, we must compensate
ω(i, j) otherwise.
withPanother term proportional to the survival probabil-
ity j̸=a Pj (t) (3rd term, right-hand side) that ensures
Using this in Eq. (8) gives
there is no resetting if the enhancer has already reached
the target. dPi (t) X
Equation (8) is solvable using standard methods. How- = ν(i, j)Pj (t) − r∗ Pi (t)
dt j (13)
ever, the situation changes when there are insulators.
∗
It changes because enhancers and insulators may bind + r Qa (t)δi,i0 − ρa (t)δi,a ,
10
(a)
P
where Qa (t) = j̸=a Pj (t) is the survival probability.
This master equation is now in a standard form and an-
alytically solvable. It represents one of our paper’s main Enhancer
Insul
results. But before presenting the analytical solution, we ators
10−1
outline the basic arguments for including the insulator
dynamics in the resetting rate.
Pj(enh)
In our simulations, we update the positions of the en-
hancer and the insulators by drawing loops with rates 10−3
kl (j|i). If the enhancer and one of the insulators happen
to end up on the same lattice site, they form a complex
with rate kb (j). To include this process in the resetting 10−5
rate r(j|i), we make two assumptions. First, we assume Promoter
that the insulators’ probability density function is equili-
brated (implying rapid looping dynamics). We calculate 0 10 20 30 40
(ins) Site index j
the insulator’s probability distribution Pj analytically
from Eq. (8) by leaving out the target (see Eq. (21)).
(b)
(enh)
We overlay a few examples of Pj alongside simulated 107
First-hitting moments
data in Fig. 7(a); The agreement is excellent.
106 Variance
Second, we assume that the resetting rate r(j|i) for an
enhancer is a weighted combination of the resetting rate 105
with or without an insulator at site j. Denoting the rates
for these cases as r(j|i)ins and r(j|i)no ins , we obtain 104
This solution allows us to extract the average hitting elements are critical to prevent unintended gene acti-
time analytically in terms of ν. By expanding ρa (s) for vation and are relatively simple from a genetic point
small s (long-time limit) and using of view—removing them from DNA or abolishing the
associated transcription factors causes gene activation.
s2 2 Yet, it remains unclear how this happens because of
ρa (s) ≃ 1 − ⟨ta ⟩s + ⟨t ⟩ − . . . , (18)
2 a recent conflicting empirical observations [14]. This pa-
we obtain to first-order in s that [50] per presents a new mechanistic model where insulators
bind weakly to surrounding chromatin and the enhancer
P Bj −Aj rather than insulators attaching themselves, which is a
j̸=1 r ∗ −λj
⟨ta ⟩ = . (19) common assumption.
Aj
An + r ∗
P
j̸=1 r ∗ −λj This assumption largely derives from so-called insula-
tor bypass [51–53]. This phenomenon refers to a fam-
A similar expansion focusing on the second-order terms ily of genetic experiments aiming to neutralize insulator-
yields the second moment enhancer blocking by genetically inserting a piece of
2 foreign DNA. Known as transgenic constructs, these
⟨t2a ⟩ = 2 × DNA pieces contain yet another insulator (specifically,
P r∗ A
A1 + j̸=1 r∗ −λjj a short DNA sequence that attracts insulator-binding
" factors), and several studies show that these constructs
X ∗
r A j Bj help remove the blocking when sandwiched between the
− ∗ + ∗ ×
(r − λj )2 r − λj insulator-enhancer pair. The governing explanation for
j̸=1
these observations is that insulators pair up and form a
(20)
X B j − Aj loop. This theory has yet further support from measure-
+ ments showing that some insulator-binding factors can
r∗ − λj bind each other (e.g., [53]). However, most transgenic ex-
j̸=1
# periments study short-ranged interactions, typically less
X r ∗ Aj X B j − Aj than 5 kb. While insulator pairing could be the primary
A1 + .
r∗ − λj (r∗ − λj )2 insulation mechanism at short distances, it is doubtfully
j̸=1 j̸=1
so over long distances. This is the principal observation
In supplementary (see SIV), we provide an explicit for- in [14] using Hi-C data, having 5kb as the lower resolution
mula for the variance σ 2 = ⟨t2a ⟩ − (⟨ta ⟩)2 . limit. They could not detect notable insulator-insulator
To test the theory against simulations, we compare the interactions across thousands of pairs.
variance and mean in Fig. 7(b). The simulations (sym- These observations form the starting point of this
bols) and the theory (black dashed lines) show excellent work. We aimed to establish a new biophysical model
agreement. that did not rely on specific insulator-insulator interac-
As a final result, we show that the analytical result for tions. Instead, it rests on generic but weak insulator-
the probability density Pi (t) is valid for both enhancers chromatin interactions. From a few assumptions, we cali-
and insulators. If ignoring the absorbing target (thus brated the model to a few independent empirical datasets
setting Qa (s) = 1), Eq. (16) becomes and derived analytical results that fit empirical observa-
tions.
∗
X r∗ − λj e(λj −r )t
One assumption that likely represents an oversimplifi-
Pi (t) = Vij Vji−1 , (21)
j
0
r∗ − λj cation is that the insulators have equal binding strengths
to enhancers and the surrounding chromatin. We also
where the initial condition here is Pi (0) = δi,i0 . In the limit this study to simple enhancer-insulator-promoter
steady-state, this equation simplifies to configurations. But in reality, gene clusters have more
complex arrangements, including many enhancers, insu-
X r∗ lators, and promoters, all having heterogeneous binding
Pi (t → ∞) = Vi1 V1i−1 + Vij Vji−1 , (22)
0
r∗ − λj 0
strengths. It would be instructive to investigate how hit-
j̸=1
ting frequencies between select enhancer-promoter pairs
which we used to estimate the insulators’ positions in the respond to changes in these variables. For example, will
resetting rate r(j|i). We show the complete derivation in one strong insulator do the same job as a few weak ones?
the supplementary material (Sec. SV). We leave these questions for future work.
To close, mammalian genomes harbor ∼ 20, 000 genes
regulated by ∼ 900, 000 enhancer-like elements inter-
V. DISCUSSION AND CONCLUSION spersed with ∼ 30, 000 CTCF-bound sites, many of which
act as insulators [54]. These elements represent critical
Cells use a complex web of enhancer-insulator inter- components of gene regulatory networks that also seem
actions to support gene regulatory networks and orches- to shape DNA’s spatial organization by forming com-
trate signaling cascades during development. Insulator plex networks of 3D interactions and semi-hierarchical
12
3D communities [55] (e.g., Topologically Associated Do- want to thank Dr. Yuri Schwartz and Dr. Tatiana
mains and A/B compartments). Thus, unveiling the Kahn (Umeå University) for providing experimental data
mechanisms of insulation is a critical step to understand- in Fig. 1(c) and valuable discussions. The computa-
ing the causal mechanisms of the structure-function re- tions were enabled by resources provided by the Na-
lationship of interphase chromosomes. tional Academic Infrastructure for Supercomputing in
Sweden (NAISS) and the Swedish National Infrastruc-
ture for Computing (SNIC) at High-Performance Com-
ACKNOWLEDGMENTS puting Center North (HPC2N), partially funded by the
Swedish Research Council through grant agreements no.
We acknowledge financial support from the Swedish 2022-06725 and no. 2018-05973.
Research Council (grant no. 2017-03848 and no. 2021- L.L. and L.H. devised the study. L.H. performed the
04080). RM acknowledges funding from the German simulations, analytical solutions and visualizations. All
Science Foundation (DFG, grant ME 1535/12-1). We authors contributed in writing the manuscript.
[1] C. Chakraborty, I. Nissen, C. A. Vincent, A.-C. [21] S. Berry, C. Dean, and M. Howard, Cell systems 4, 445
Hägglund, A. Hörnblad, and S. Remeseiro, Nature Com- (2017).
munications 14, 6446 (2023). [22] O. Bénichou, C. Loverdo, M. Moreau, and R. Voituriez,
[2] A. Panigrahi and B. W. O’Malley, Genome biology 22, Reviews of Modern Physics 83, 81 (2011).
1 (2021). [23] M. Cencini and S. Pigolotti, Nucleic Acids Research 46,
[3] P. Geyer and I. Clark, Cellular and Molecular Life Sci- 558 (2017).
ences CMLS 59, 2112 (2002). [24] C. G. Kalodimos, R. Boelens, and R. Kaptein, nature
[4] M. Mohrs, C. M. Blankespoor, Z.-E. Wang, G. G. Loots, structural biology 9, 193 (2002).
V. Afzal, H. Hadeiba, K. Shinkai, E. M. Rubin, and R. M. [25] O. Bénichou, Y. Kafri, M. Sheinman, and R. Voituriez,
Locksley, Nature immunology 2, 842 (2001). Physical Review Letters 103, 138102 (2009).
[5] A. Udvardy, E. Maine, and P. Schedl, Journal of molec- [26] C. Felipe, J. Shin, and A. B. Kolomeisky, The Journal of
ular biology 185, 341 (1985). Physical Chemistry B 125, 1727 (2021).
[6] C. Holdridge and D. Dorsett, Molecular and cellular bi- [27] A. A. Shvets and A. B. Kolomeisky, The journal of phys-
ology (1991). ical chemistry letters 7, 5022 (2016).
[7] A. M. Bushey, E. Ramos, and V. G. Corces, Genes & [28] S. Vemulapalli, M. Hashemi, A. B. Kolomeisky, and Y. L.
development 23, 1338 (2009). Lyubchenko, The Journal of Physical Chemistry B 125,
[8] J. R. Raab and R. T. Kamakaka, Nature Reviews Genet- 4645 (2021).
ics 11, 439 (2010). [29] R. Phillips, J. Kondev, J. Theriot, and H. Garcia, Phys-
[9] S. S. Rao, M. H. Huntley, N. C. Durand, E. K. Sta- ical biology of the cell (Garland Science, 2012).
menova, I. D. Bochkov, J. T. Robinson, A. L. Sanborn, [30] Y. Kafri, D. Mukamel, and L. Peliti, Physica A: Statisti-
I. Machol, A. D. Omer, E. S. Lander, et al., Cell 159, cal Mechanics and its Applications 306, 39 (2002).
1665 (2014). [31] G. D. Stormo and D. S. Fields, Trends in biochemical
[10] J. R. Dixon, S. Selvaraj, F. Yue, A. Kim, Y. Li, Y. Shen, sciences 23, 109 (1998).
M. Hu, J. S. Liu, and B. Ren, Nature 485, 376 (2012). [32] O. G. Berg and P. H. von Hippel, Journal of molecular
[11] A. L. Sanborn, S. S. Rao, S.-C. Huang, N. C. Durand, biology 193, 723 (1987).
M. H. Huntley, A. I. Jewett, I. D. Bochkov, D. Chin- [33] O. Fornes, J. A. Castro-Mondragon, A. Khan, R. Van der
nappan, A. Cutkosky, J. Li, et al., Proceedings of the Lee, X. Zhang, P. A. Richmond, B. P. Modi, S. Correard,
National Academy of Sciences 112, E6456 (2015). M. Gheorghe, D. Baranašić, et al., Nucleic acids research
[12] K. Polovnikov and B. Slavov, Physical Review E 107, 48, D87 (2020).
054135 (2023). [34] L. Finzi and J. Gelles, Science 267, 378 (1995).
[13] B. Doyle, G. Fudenberg, M. Imakaev, and L. A. Mirny, [35] B. v. d. Broek, F. Vanzi, D. Normanno, F. S. Pavone,
PLoS computational biology 10, e1003867 (2014). and G. J. Wuite, Nucleic acids research 34, 167 (2006).
[14] T. G. Kahn, M. Savitsky, C. Kuong, C. Jacquier, G. Cav- [36] A. S. Weintraub, C. H. Li, A. V. Zamudio, A. A. Sigova,
alli, J.-M. Chang, and Y. B. Schwartz, Science Advances N. M. Hannett, D. S. Day, B. J. Abraham, M. A. Cohen,
9, eade0090 (2023). B. Nabet, D. L. Buckley, et al., Cell 171, 1573 (2017).
[15] Y. B. Schwartz and G. Cavalli, Genetics 205, 5 (2017). [37] L. A. Mirny, Chromosome research 19, 37 (2011).
[16] J. Dekker and L. Mirny, Cell 164, 1110 (2016). [38] T. G. Mattos, C. Mejı́a-Monasterio, R. Metzler, G. Os-
[17] L. Schäfer, C. von Ferber, U. Lehr, and B. Duplantier, hanin, and G. Schehr, in First-Passage Phenomena and
Nuclear Physics B 374, 473 (1992). Their Applications (World Scientific, 2014) pp. 203–225.
[18] A. Hanke and R. Metzler, Biophysical journal 85, 167 [39] D. S. Grebenkov, R. Metzler, and G. Oshanin, Commu-
(2003). nications Chemistry 1, 96 (2018).
[19] L. Lizana, N. Nahali, and Y. B. Schwartz, Journal of [40] T. G. Mattos, C. Mejı́a-Monasterio, R. Metzler, and
Biological Chemistry 299 (2023). G. Oshanin, Physical Review E 86, 031143 (2012).
[20] M. J. Lundkvist, L. Lizana, and Y. B. Schwartz, Science [41] C. Mejı́a-Monasterio, G. Oshanin, and G. Schehr, Jour-
Advances 9, eadj8198 (2023). nal of Statistical Mechanics: Theory and Experiment
13