Machine-Learning Hidden Symmetries 2109.09721

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Machine-learning hidden symmetries

Ziming Liu and Max Tegmark


Department of Physics, Massachusetts Institute of Technology, Cambridge, USA
(Dated: September 21, 2021)
We present an automated method for finding hidden symmetries, defined as symmetries that
become manifest only in a new coordinate system that must be discovered. Its core idea is to quan-
tify asymmetry as violation of certain partial differential equations, and to numerically minimize
such violation over the space of all invertible transformations, parametrized as invertible neural net-
works. For example, our method rediscovers the famous Gullstrand-Painlevé metric that manifests
hidden translational symmetry in the Schwarzschild metric of non-rotating black holes, as well as
Hamiltonicity, modularity and other simplifying traits not traditionally viewed as symmetries.

(a) Manifest (b) Hidden (c) Discovered Manifest


arXiv:2109.09721v1 [cs.LG] 20 Sep 2021

INTRODUCTION
Symmetry Symmetry Symmetry
Philip Anderson famously argued that “It is only
slightly overstating the case to say that physics is the
study of symmetry” [1], and discovering symmetries has
proven enormously useful both for deepening understand-
ing and for solving problems more efficiently, in physics
[1–3] as well as machine learning [4–10].
Discovering symmetries is useful but hard, because
they are often not manifest but hidden, becoming man- FIG. 1: 1D harmonic oscillator phase space flow vector
ifest only after an appropriate coordinate transforma- field f(x, p) = (p, −x). The rotational symmetry of f is
tion. For example, after Schwarzschild discovered his manifest in (a) and hidden in (b). Our algorithm can
eponymous black hole metric, it took 17 years until reveal the hidden symmetry by auto-discovering the
Painlevé, Gullstrand and Lemaı̂tre showed that it had transformation from (b) to (c).
hidden translational symmetry: they found that the spa-
tial sections could be made translationally invariant with
a clever coordinate transformation, thereby deepening
our understanding of black holes [11]. As a simpler exam-
Trainable 𝑧
ple, Fig. 1 shows the same vector field in two coordinates
Fixed
systems where rotational symmetry is manifest and hid- Tensor field Neural 𝑊
den, respectively. with hidden 𝜕𝑧′
Our results below are broadly applicable because they symmetry Network ≡
apply to a very broad definition of symmetry, including 𝜕𝑧
not only invariance and equivariance with respect to arbi-
trary Lie groups, but also modularity and Hamiltonicity.
𝑇(𝑧) 𝑧′
If a coordinate transformation is discovered that makes
such simplifying properties manifest, this can not only
Tensor Transform Rule
deepen our understanding of the system in question, but
also enable an arsenal of more efficient numerical meth- 2
𝐿෠ 𝑇 ′
ods for studying it. 𝑇′(𝑧′) Loss
Discovering hidden symmetries is unfortunately highly
non-trivial, because it involves a search over all smooth
invertible coordinate transformations, and has tradition- FIG. 2: Schematic workflow of our algorithm for
ally been accomplished by scientists making inspired discovering hidden symmetry
guesses. The goal of this Letter is to present a machine
learning method for automating hidden symmetry dis-
covery. Its core idea is to quantify asymmetry as violation
of certain partial differential equations, and to numeri- In the Method section, we introduce our notation and
cally minimize such violation over the space of all invert- symmetry definition and present our method for hidden
ible transformations, parametrized as invertible neural symmetry discovery. In the Results section, we apply
networks. For example, the neural network automati- our method to classical mechanics and general relativ-
cally learns to transform Fig. 1(b) into Fig. 1(c), thereby ity examples to test its ability to auto-discover hidden
making the hidden rotational symmetry manifest. symmetries.
2

TABLE I: PDE and Losses for Generalized Symmetries J ≡ ∇f denote the Jacobian of f (Jij ≡ fi,j ) and us-
ing the fact that M−1 = Mt = −M, Hamiltonicity is
Generalized symmetry Linear operator L̂ Loss ` Examples thus equivalent to satisfying the PDEs L̂ij f = 0 where
Translation invariance L̂j = ∂j `TI A,E,F
Lie invariance L̂j = Kj z · ∇ `INV E,F L̂ij f = (MJ + Jt M)ij for all i and j ( n(n−1) 2 independent
Lie eqvariance L̂j = Kj z · ∇ ± Kj `EQV B ODE’s in all), corresponding to L̂ij = −mti ∂j + mtj ∂i ,
L̂xj = Kj x · ∇x − Kjt p · ∇p + Kjt where mi are the column vectors of M. In other words,
Canonical eqvariance p `CAN C
L̂j = Kj x · ∇x − Kjt p · ∇p − Kj
although Hamiltonicity is not traditionally thought of as
Hamiltonicity L̂ij = −mti ∂j + mtj ∂i `H A,B,C,D
Modularity L̂ij = Aij ẑti ∂j `M D
a symmetry, it conveniently meets our generalized sym-
metry definition and can thus be auto-discovered with
our method.
METHOD Canonical equivariance: We define a Hamiltonian
system as canonically equivariant if z = (x, p) and the
PDEs encoding generalized symmetries vector field f ≡ (fx , fp ) satisfies fx (gx, g −1 p) = g −t fx and
fp (gx, g −1 p) = gfp for all g ∈ G. These two equations
We seek to discover symmetries in various tensor fields are equivalent to the PDEs L̂xj fx = 0 and L̂p j fp = 0 with
x t t p
T (z) for z ∈ Rn , for example the vector field f(z) (a rank- L̂j = Kj x · ∇x − Kj p · ∇p + Kj and L̂j = Kj x · ∇x −
1 tensor) defining a dynamical system z(t) through a vec- Kjt p · ∇p − Kj . In special cases when the generator Kj is
tor differential equation ż = f(z), or the metric g(z) (a anti-symmetric (e.g., for the rotation group), L̂xj = L̂p j.
rank-2 tensor) quantifying spacetime geometry in general Modularity: A dynamical system z(t) obeying ż =
relativity. We say that a tensor field T has a generalized f (z) is modular if the Jacobian J = ∇f is block-diagonal,
symmetry if it obeys a linear partial differential equation which implies that the components of z corresponding to
(PDE) L̂ T = 0, where L̂ is a linear operator that en- different blocks evolve independently of each other. More
codes the symmetry generators. This definition covers generally, we say that a system is (n1 + · · · + nk )-modular
a broad range of interesting situations, as illustrated by if J vanishes except for blocks of size n1 , ...nk , which we
the examples below (see Table I for a summary). can write as A◦J = 0 where ◦ denotes element-wise mul-
Translational Invariance: A tensor field T is invari- tiplication (([A ◦ J])ij = Aij Jij ) and the elements of the
ant under translation in the jˆth coordinate direction ẑj mask matrix A equal 1 inside the blocks, vanishing oth-
if T (z + aẑi ) = T (z) for all z and a, which is equivalent erwise. Although modularity is typically not viewed as
to satisfying the PDE ∂T /∂zj = 0, corresponding to the a symmetry, it too thus meets our generalized symmetry
linear operator L̂ = ∂j . definition and can be auto-discovered with our method
Lie invariance & equivariance: If a vector field using the matrix PDE A ◦ ∇f = 0, corresponding to the
f(z) satisfies f(gz) = g n f(z) for all elements g of some linear operators L̂ij ≡ Aij ẑti ∂j .
Lie group G and an integer n = −1, 0 or 1, then we say
that f is invariant if n = 0, and equivariant otherwise
(n = 1 corresponds to a covariant (1,0) vector field and Our algorithm for discovering hidden symmetries
n = 1 corresponds to a contravariant (0,1) vector field).
Differentiating the identity T (eKj a z) = enKj a T (z) with We now describe our algorithm of discovering hidden
respect to a shows that Lie invariance/equivariance is symmetries. Since L̂T = 0 implies manifest symmetry,
equivalent to the PDEs L̂j f = 0 with L̂j ≡ Kj z · ∇ − |L̂T |2 is a natural measure of manifest symmetry viola-
nKj . Figure 1 (a) and (c) show examples of rotational tion. We therefore define the symmetry loss as
equivariance.
Hamiltonicity: A dynamical system z(t) ∈ R2d obey- h|L̂T (z)|2 i
`≡ , (2)
ing a vector differential equation ż = f(z) is called Hamil- h|z|2 iα
tonian if f = M∇H for a scalar function H, where where angle brackets denote averaging over some set of

0 I
 points zi , and α is chosen so that ` is scale-invariant
M≡ , (1) i.e., invariant under a scale transformation z → az,
-I 0
T → am−n T , L̂ → as L̂, ` → a2(m−n+s−α) if T has m
and I is the d × d identity matrix. Such systems are contravariate indices and n covariate indices. Hence we
of great importance in physics, where it is customary choose α = m − n + s. We jointly search for multi-
to write z = (x, p), because the Hamiltonian function ple hidden symmetries by generalizing the numerator of
H(z) (interpreted as energy) is a conserved quantity un- equation (2) to a sum of terms h|L̂T (z)|2 i, one for each
der the system evolution ż = f(z) = (ẋ, ṗ) = M∇H = symmetry, denoted by subscripts as in Tab. I.
(∂p H, −∂x H). Hamiltonicity thus corresponds to M−1 f Discovering hidden symmetry is equivalent to mini-
being a gradient, i.e., to its generalized curl (the antisym- mizing ` over all diffeomorphisms (everywhere differen-
metric parts of its Jacobian matrix) vanishing. Letting tiable and invertible coordinate transformations), which
3

we parametrize with an invertible neural network. Fig- responding to systems involving free motion, harmonic
ure 2 shows the workflow of our algorithm: (1) a neural oscillation or Kepler problem whose simplicity has been
network transforms z 7→ z0 and obtains the transforma- obfuscated by various coordinate transformations. We
tion’s Jacobian W ≡ dz0 /dz; (2) in parallel with (1), we consider a hidden symmetry to have been tentatively dis-
evaluate the known tensor field T at z; (3) we jointly covered if its corresponding loss drops below a threshold
feed W(z) and T (z) into a module which implements  = 10−3 . If that happens, we apply the AI Feynman
the tensor transformation rule and gives T 0 (z0 ); (4) we symbolic regression package [12, 13] to try to discover
compute the symmetry loss of `(T 0 ). Note that only the a symbolic approximation of the learned transformation
neural network is trainable, while both the tensor field fNN that makes the symmetry loss zero. As can be seen in
with hidden symmetry T (z) and tensor transformation Tab. II and Fig. 3, all hidden symmetries are successfully
rule are hard-coded in the workflow. We update the neu- discovered together with the coordinate transformations
ral network with back-propagation to find the coordinate that reveal them. This includes not only traditional hid-
transformation z 7→ z0 that minimizes `. If the resulting den symmetries such as translational invariance (exam-
` is effectively zero, a hidden symmetry has been discov- ple A) and rotational equivariance (example B), but also
ered. Hamiltonicity and modularity.
A-C were toy examples in the sense that we had hidden
the symmetries by deliberate obfuscation. In contrast,
Neural network and training the value of our algorithm lies in its ability to discover
symmetries hidden not by people but by nature, as in
We parametrize the coordinate transformation z 7→ z0 example D (the linearized double pendulum). We see
as z0 = z + fNN (z), where fNN is a fully connected neural that our method auto-discovers both hamiltonicity (by
network with two hidden layers containing 256 neurons finding the correct conjugate momentum variables) and
each. We use a tanh activation function rather than the modularity (by auto-discovering the two normal modes),
popular ReLU alternative, because our method requires even though neither of these symmetries were manifest
activation functions to be twice differentiable (since the in the most obvious physical coordinates (the pendulum
loss function involves first derivatives of output with re- angles and angular velocities) [14].
spect to input via the Jacobian W). The invertibility
of the mapping z → z0 is guaranteed by the fact that
det W → 0 and the loss function ` → ∞ if fNN ap- General relativity examples
proaches non-invertibility, as seen in equations (B3)- (B5)
in the supplementary material. The supplementary ma- As a first GR application (example E), we consider the
terial also provides further technical details on the selec- Friedmann-Robertson-Walker (FRW) metric1 describing
tion of data points z, neural network initialization and a homogeneous and isotropic expanding universe with
training. negative spatial curvature (k = 1) and cosmic scale fac-
tor evolution a(t) = t. A GR expert will realize that
its Riemann tensor vanishes everywhere, so that there
RESULTS must exist a coordinate transformation revealing this to
be simply empty space in disguise, with Poincaré sym-
We will now test our algorithm on 6 physics examples, metry (Lorenz symmetry and 4D translational symme-
ranging from classical mechanics to general relativity. try). Discovering this transformation is non-trivial, and
Table II summarizes these examples, labeled A, B,...,F is sometimes assigned as a homework problem in gradu-
for easy reference, listing their manifestly non-symmetric ate courses.
equations, their simplifying coordinate transformations, It is easy to show that any metric with Poincaré
their transformed and manifestly symmetric equations, symmetry must be a multiple of the Minkowski met-
and their discovered hidden symmetries. As we will see, ric η, so we define our Poincaré symmetry loss as
all the symmetries we had hidden in our test examples ` ≡ h||T (z) − η||2 i/h||T (z)||2 i. Figure 3 (E) shows that
were rediscovered by our algorithm. The only example is the Minkowski loss drops below 10−3 , indicating that
the transformation for A, where the problem is so sim- the k = −1 FRW metric is indeed homomorphic to
ple that an infinite family of transformations give equal Minkowski space, while the loss gets stuck above 10−3
symmetry. for k = −2 and k = 0. Applying the AI Feynman
symbolic regression package [12] to the learned trans-
formation fNN = (t, x, y, z) reveals the exact formula
Warmup examples

p
1
To build intuition for how our method works, we first We use r ≡ x2 + y 2 + z 2 for brevity in Tab. II, but not in the
apply it to the simple warmup examples A, B and C, cor- neural network, which actually takes (t, x, y, z) as inputs.
4

TABLE II: Physical Systems studied


ID Name Original dynamics ż = f(z) or metric g(z) Transformation z 7→!z0 Symmetric dynamics ż0 = f0 (z0 ) or metric g 0 (z0 ) Manifest Symmetries (m, n, s, t) Symbolic sol.?
1 1
e2x + e2p
  1
(a + b)ln( a−b
      
1D Uniform d a ) a d x p Hamiltonicity (1, 0, −1, 0)
A = 12 2 = 1 1 = No, Yes
Motion dt b 2
(a + b)ln( a−b 2
) b e2x − e2p dt p 0 1D Translational invariance (1, 0, −1, 0)
      1x
!    
1D Harmonic d a (1 + a)ln(1 + b) d a e2 − 1 d x p Hamiltonicity (1, 0, −1, 0)
B = = 1 = Yes, Yes
Oscillator dt b −(1 + b)ln(1 + a) dt b e2p − 1 dt p −x SO(2)-equivariance (1, 0, 0, 1)
   (1 + a)ln(1 + b)
    21 x 
e −1
   
a (1+b)ln(1+a)
a x px
1
d b
   − 8(ln2 (1+a)+ln2 (1+c))3/2  b  e 2 px − 1   −x/r3 
d px  Hamiltonicity (1, 0, −1, 0)
 
C 2D Kepler dt  c 
=  = 1y =  Yes, Yes
(1 + c)ln(1 + d) c  e 2 − 1  dt  y   py  Can SO(2)-equivariance (1, 0, 0, 1)
 
  
d (1+d)ln(1+c)
− 8(ln2 (1+a)+ln2 (1+c))3/2 d 1p
e2 y − 1 py −y/r3
        
  θ1 −1 1 0 0 θ+ θ+ θ̇+
θ̇1
 
θ1 θ2   a a 0 0 θ−    −ω+ 2
 − (m1 +m2 )g θ + m2 g θ  d θ̇+  θ+ 
Double θ̇ θ̇1  =  0 0 −1 1 θ̇+  = Hamiltonicity (1, 0, −1, 0)
     
1 1 2 dt θ 
θ̇− 

d  
 m1 l m1 l
D =
  − 
Yes, Yes
Pendulum dt θ 
θ̇2 (2 + 2)-Modularity (1, 0, −1, 0)
 2
2   θ̇2 0 0 a a θ̇− θ̇− −ωq − θ−
θ̇2 (m1 +m2 )g (m1 +m2 )g
θ − θ
q
2 (m1 +m2 )g m2
m1 l 1 m1 l 2
a = m1m+m 2
2
ω± = m1 l
(1 ± m1 +m2 )

1 0 0 0
  0  √   
kxyt2
t t 1 + r2 1 0 0 0
Expanding 0 −(r2 + kx2 2 ) t22 − 1−kr kxzt2
− 1−kr x0   tx  0 −1 0 0  SO(3,1)-Invariance (0, 2, 0, −2)
2 2

1−kr r
E universe g= y 0  =  g= Yes, Yes
 
kxyt2 ky 2 t2 kyzt2
    
0 − 1−kr −(r2 + 1−kr 2 ) r2 − 1−kr ty 0 0 −1 0  4D Translational Invariance (0, 2, −1, −3)
 
2 2
& empty space 
0 kxzt2
− 1−kr2 kyzt2
− 1−kr2 2 kz 2
−(r + 1−kr2 ) r2 t2 z0 tz 0 0 0 −1
 q q q 
y
1 − 2M − 2M x
− 2M − 2M z
   0  h i
1 − 2M
r
0 0 0 t t + 2M 2u + ln u−1
u+1  q r r r r r r r
Schwarzchild  0 −1 − (r−2M2M x2 2M xy
− (r−2M 2M xz
− (r−2M x0   2M x
x − −1 0 0
   
 )r 2 )r 2 )r 2   0 = 
y  

q r r
 SO(3)-Invariance (0, 2, 0, −2)
F black hole g= 2M xy 2M y 2 2M yz

y  g=  Yes, Yes
 0 − (r−2M −1 − (r−2M − (r−2M − 2M y  3D Translational Invariance (0, 2, −1, −3)
z0 −1
 
& GP metric )r 2 )r 2 )r 2 0 0
p rz  q r r
  
2M xz 2M yz 2M z 2
0 − (r−2M )r2 − (r−2M )r2 −1 − (r−2M )r2

u ≡ 2M − 2M z
0 0 −1
r r

A. 1D Uniform Motion B. 1D Harmonic Oscillator C. 2D Kepler


100 100
10 1
SO(2)-Eqv Loss
10 1
Translational Inv Loss 10 1
Hamiltonicity Loss Hamiltonicity Loss Canonical Eqv Loss
10 2 Hamiltonicity Loss 10 2
10 2
10 3
loss

loss

loss
10 3 10 3
10 4

10 5
10 4
10 4
10 6
10 5
0 250 500 750 1000 1250 1500 1750 2000 0 250 500 750 1000 1250 1500 1750 2000 0 200 400 600 800 1000
iteration iteration iteration
D. Double Pendulum ( 0 = 0.1) E. FRW & Minkowski Metric
100
Hamiltonicity k=-1 F. GP coordinate of Schwarzschild
Loss (2+2)-Modularity Loss 10 1 k=-2 2 Gullstrand Painlevé
10 1
k=0 NN
Minkowski loss

1
10 2
10 2
t t 0 = a(r)

0
loss

1
10 3
a(u) = 2u + ln(uu + 11 )
10 3
2
10 4
u = 2Mr
3
10 4
4
0 250 500 750 1000 1250 1500 1750 2000 0 500 1000 1500 2000 2500 3000 1 2 3 4 5 6
iteration iteration r

FIG. 3: All hidden symmetries in six tested systems are discovered by our algorithm. The last column shows that we
also auto-discover symbolic formulas for five of the transformations and all of the symmetrized dynamics/metrics.

p h i
(x0 , y 0 , z 0 , t0 ) = (tx, ty, tz, t 1 + x2 + y 2 + z 2 ), which the coordinate transformation t0 = t + 2M 2u + ln u−1 u+1 ,
gives vanishing loss. p
where u ≡ r/2M , which is auto-discovered by ap-
plying AI Feynman [12] to the learned transformation
We now turn to studying the spacetime of a non- fNN (see Fig. 3, panel F). Since both the original and
rotating black hole described by the Schwarzschild metric target metrics have the SO(3) (rotational) spatial sym-
(without loss of generality, we set 2M = 1). This prob- metry, our neural network parametrizes the coordinate
lem proved so difficult that it took physicists 17 years to transformation (x, y, z, t) → (x0 , y 0 , z 0 , t0 ) via a two-
clear up the misconception that something singular oc- dimensional transformation (r, t) → (r0 , t0 ), where r ≡
curs at the event horizon, until it was finally revealed that p
x + y + z 2 . This transforms the Schwarzschild met-
2 2
the apparent singularity at r = 2M was merely caused ric into the famous Gullstrand-Painlevé metric [17, 18],
by a poor choice of coordinates [15–18], just as the z- which is seen to be perfectly regular at the event horizon
axis is merely a coordinate singularity in spherical coor- and can be intuitively interpreted simply as flat space
dinates. Our method auto-discovers hidden translational flowing inward with the classical escape velocity [11, 19].
symmetry in the spatial coordinates (x, y, z), revealed by
5

CONCLUSIONS Hackett, S. Racanière, D. J. Rezende, and P. E. Shana-


han, Equivariant flow-based sampling for lattice gauge
theory, Phys. Rev. Lett. 125, 121601 (2020).
We have presented a machine-learning algorithm for [10] D. Boyda, G. Kanwar, S. Racanière, D. J. Rezende, M. S.
auto-discovering hidden symmetries, and shown it to be Albergo, K. Cranmer, D. C. Hackett, and P. E. Shana-
effective for a series of examples from classical mechanics han, Sampling using SU(n) gauge equivariant flows,
and general relativity. Our symmetry definition is very Phys. Rev. D 103, 074504 (2021).
broad, corresponding to the data satisfying a differential [11] C. Misner, K. Thorne, and J. Wheeler, Gravitation
equation, which encompasses both traditional invariance (Princeton University Press, 2017).
and equivariance as well as Hamiltonicity and modular- [12] S.-M. Udrescu, A. Tan, J. Feng, O. Neto, T. Wu,
and M. Tegmark, Ai feynman 2.0: Pareto-optimal sym-
ity. bolic regression exploiting graph modularity (2020),
In future work, it will be interesting to seek hidden arXiv:2006.10782 [cs.LG].
symmetries in data from both experiments and numeri- [13] S.-M. Udrescu and M. Tegmark, Ai feynman: A physics-
cal simulations. Although our examples involved no more inspired method for symbolic regression, Science Ad-
than two symmetries at once, it is straightforward to vances 6, 10.1126/sciadv.aay2631 (2020).
auto-search for a whole library of common symmetries, [14] H. Goldstein, C. Poole, and J. Safko, Classical mechanics
(2002).
adopting the best-fitting one and recursively searching
[15] R. Penrose, Gravitational collapse and space-time singu-
for more hidden symmetries until all are found. larities, Phys. Rev. Lett. 14, 57 (1965).
Another interesting application is applying our [16] M. D. Kruskal, Maximal extension of schwarzschild met-
symmetry-discovery method with known numerical tech- ric, Phys. Rev. 119, 1743 (1960).
niques exploiting the discovered hidden symmetries. For [17] P. Painlevé, La mécanique classique et la théorie de la
example, there has been extensive recently progress relativité, C. R. Acad. Sci. (Paris) 173, 677 (1921).
building equivariant models for datasets with manifest [18] A. Gullstrand, Allgemeine lösung des statischen
einkör̈rperproblems in der einsteinschen gravitationsthe-
symmetries [4–10], so adding our method as a pre- orie, Arkiv för Matematik, Astronomi och Fysik 16 (8),
processing step can enable such techniques to exploit 1 (1922).
even hidden equivariance. [19] A. J. S. Hamilton and J. P. Lisle, The river model of black
Acknowledgement We thank the Center for Brains, holes, American Journal of Physics 76, 519–532 (2008).
Minds, and Machines (CBMM) for hospitality. This work [20] D. P. Kingma and J. Ba, Adam: A method for stochastic
was supported by The Casey and Family Foundation, the optimization, arXiv preprint arXiv:1412.6980 (2014).
Foundational Questions Institute, the Rothberg Family
Fund for Cognitive Science and IAIFI through NSF grant
PHY-2019786.

[1] P. W. Anderson, More is dif-


ferent, Science 177, 393 (1972),
https://science.sciencemag.org/content/177/4047/393.full.pdf.
[2] D. J. Gross, Symmetry in physics: Wigner’s legacy,
Physics Today 48, 46 (1995).
[3] D. J. Gross and F. Wilczek, Asymptotically free gauge
theories. i, Physical Review D 8, 3633 (1973).
[4] T. Cohen and M. Welling, Group equivariant convolu-
tional networks, in International conference on machine
learning (PMLR, 2016) pp. 2990–2999.
[5] N. Thomas, T. Smidt, S. Kearnes, L. Yang, L. Li,
K. Kohlhoff, and P. Riley, Tensor field networks:
Rotation-and translation-equivariant neural networks for
3d point clouds, arXiv preprint arXiv:1802.08219 (2018).
[6] F. B. Fuchs, D. E. Worrall, V. Fischer, and M. Welling, Se
(3)-transformers: 3d roto-translation equivariant atten-
tion networks, arXiv preprint arXiv:2006.10503 (2020).
[7] R. Kondor, Z. Lin, and S. Trivedi, Clebsch-gordan nets:
a fully fourier space spherical convolutional neural net-
work, arXiv preprint arXiv:1806.09231 (2018).
[8] V. G. Satorras, E. Hoogeboom, and M. Welling, E
(n) equivariant graph neural networks, arXiv preprint
arXiv:2102.09844 (2021).
[9] G. Kanwar, M. S. Albergo, D. Boyda, K. Cranmer, D. C.
6

Supplementary material

Appendix A: Neural network training details

Preparing data Our default method for generating training data is to draw the components of z as i.i.d. normalized
Gaussians, i.e., z ∼ N (0, IN ). However, there are three issues to note: (1) For the double pendulum, since we focus
on the small angle regime, we instead use the narrowed the Gaussian distribution (θ1 , θ2 , θ̇1 , θ̇2 ) ∼ N (0, 0.12 ). (2)
To avoid the r = 2M = 1 singularity of the Schwarzchild metric, we draw radius its r from a uniform distribution
U (1.1, 6) (F); (3) For the two general relativity examples E and F, the time variable is sampled from a uniform
distribution U [0, 3].
The ADAM optimizer [20] is employed to train the neural network. The output z0 is computed as z0 = z + fNN (z)
0
rather than z0 = fNN (z) to ensure W ≡ ∂z ∂z ≈ I at initialization, so that W
−1
is well-conditioned and avoids training
instabilities. For examples A-D, we train the neural network for 2,000 epochs with learning rate 10−3 ; For examples
E and F, we train for 1,000 epochs three times while annealing the learning rate as {5 × 10−3 , 10−3 , 2 × 10−4 }.

Appendix B: Tensor transformation rules

Although tensor transformation rules are well-known, we list the relevant ones here for the convenience of any
reader wishing to implement our method. We consider a coordinate vector z ∈ RN and a tensor field T (z). Under
the coordinate transformation z → z0 , the transformation rule for the tensor field T (z) → T 0 (z0 ) depends on the type
of T . In general relativity, a contravariant (1,0) vector v i and a covariant (0,1) vector vi transforms as

∂z 0i j
v 0i = v = Wi j v j ,
∂z j (B1)
∂z j
vi0 = 0i vj = (W−1 )ji vj = (W−T )i j vj ,
∂z
∂z 0i ···im
where Wi j ≡ ∂z j . More generally, an (m, n) tensor Tji11···jn
transforms as

0i0 ···i0 i0 −1 j1 i0 ···im


Tj 0 1···j 0m = W 1i1 · · · W m
im (W ) j 0 · · · (W−1 )jnjn0 Tji11···jn
(B2)
1 n 1

In this paper, we are interested in these specific cases:

• (1, 0) vector: The transformation rule is f → f0 = Wf, and the dynamical system ż = f(z) where ż ≡ dz
dt falls in
this category.

• (0, 2) tensor: The transformation rule is g → g0 = W−T gW−1 , and the metric tensor gµν from General
Relativity lies in this category.
···im ···im
We define the first-order differentiation of T wrt z as J ≡ ∇T or Jji11···j nl
≡ ∂l Tji11···jn
. Written as such, J is not a
(m, n + 1) tensor because its transformation rule is

0i0 ···i0 0i0 ···i0 i0 −1 j1 i0 ···im


Jj 01···j 0ml0 ≡ ∂l00 Tj 0 1···j 0m = (W−T )l0l ∂l (W 1i1 · · · W m
im (W ) j 0 · · · (W−1 )jnjn0 Tji11···j n
)
1 n 1 n 1
m m n
i0a i0b
X Y Y
···im
= (W−T )l0l (∂l W ia )( W ib )( (W−1 )jcjc0 )Tji11···j n
a=1 b=1,b6=a c=1
n m n
(B3)
i0
X Y Y
···im
+ (W −T l
)l0 ( W bib )(∂l (W−1 )jaja0 )( (W−1 )jcjc0 )Tji11···j n
a=1 b=1 c=1,c6=a
i0 −1 j1 i0 ···im
+ (W−T )l0l W 1i1 · · · W m
im (W ) j 0 · · · (W−1 )jnjn0 ∂l Tji11···jn
1

(1, 0) vector f:
7

J0ij ≡ ∂j0 f0i


= (W−T )jk ∂k (Wi l fl ) (B4)
= (W−T )kj (∂k Wi l )fl + (T −T k i
)j W l ∂k f l

(0, 2) tensor g:

J0ijk ≡ ∂k0 gij


0

= (W−T )kl ∂l ((W−T )i m gmn (W−1 )nj )


= (W−T )kl ∂l (W−T )i m gmn (W−1 )nj + (W−T )kl (W−T )i m ∂l gmn (W−1 )nj + (W−T )kl (W−T )i m gmn ∂l (W−1 )nj
(B5)

Appendix C: Physical systems

For convenience, we here review the well-known equations of the physical systems we test our method on in the
paper.

1. Double pendulum

The dynamics of a double pendulum can be described by two angles (θ1 , θ2 ) and corresponding angular velocities
(θ̇1 , θ̇2 ):
 
 
θ1 θ̇1
 θ̇2 
θ2  =  m2 l1 θ̇12 sin(θ2 −θ1 )cos(θ2 −θ1 )+m2 gsinθ2 +m2 l2 θ̇22 sin(θ2 −θ1 )−(m1 +m2 )gsinθ1 
d    
(C1)
dt  θ̇ 1
 
 (m1 +m2 )l1 −m2 l1 cos2 (θ2 −θ1 )


2 2
θ̇2 −m2 l2 θ̇2 sin(θ2 −θ1 )+(m1 +m2 )(gsinθ1 cos(θ2 −θ1 )−l1 θ̇2 sin(θ2 −θ1 )−gsinθ2 )
2
(m1 +m2 )l1 −m2 l1 cos (θ2 −θ1 )

The canonical coordinates corresponding to (θ̇1 , θ̇2 ) are (p1 , p2 ), given by

p1 = (m1 + m2 )l12 θ̇1 + m2 l1 l2 θ̇2 cos(θ1 − θ2 ),


(C2)
p2 = m2 l22 θ̇2 + m2 l1 l2 θ̇1 cos(θ1 − θ2 ).

The dynamical equations (B1) can be rewritten in terms of (θ1 , θ2 , p1 , p2 ):


   l2 p1 −l1 p2 cos(θ1 −θ2 )

θ1 l12 l2 (m1 +m2 sin2 (θ1 −θ2 )) p1 p2 sin(θ1 −θ2 )
!
l1 (m1 +m2 )p2 −l1 m2 p1 cos(θ1 −θ2 ) 
 
d  θ 2
 =
  C1 l l
1 2 (m 1 +m 2 sin2 (θ −θ ))
1 2
l1 l22 m2 (m1 +m2 sin2 (θ1 −θ2 ))  , where = l22 m2 p21 +l12 (m (C3)
 
2
1 +m2 )p2 −l1 l2 m2 p1 p2 cos(θ1 −θ2 )
dt p1  −(m1 + m2 )gl1 sinθ1 − C1 + C2  C2
2l12 l22 (m1 +m2 sin2 (θ1 −θ2 ))2
p2 −m gl sinθ + C − C
2 2 2 1 2

In the small angle regime θ1 , θ2  1, after using the approximations sin θ1 ≈ θ1 , sin θ2 ≈ θ2 , cos(θ1 − θ2 ) ≈ 1, θ̇12 ≈ 0,
θ̇22 ≈ 0 and setting l1 = l2 = l, (E1)-(E3) simplify to
    
θ1 0 0 1 0 θ1
d  θ 2   0 0 0 1 θ2 
  =  (m1 +m2 )g

m2 g
  , (C4)
dt θ̇1  − m1 l m1 l 0 0 θ̇1 
(m1 +m2 )g
θ̇2 m1 l − (m1m
+m2 )g
1l
0 0 θ̇2

p1 = (m1 + m2 )l2 θ̇1 + m2 l2 θ̇2 ,


(C5)
p2 = m2 l2 θ̇2 + m2 l2 θ̇1 ,
8

1
− m11 l2
    
θ1 0 0 m1 l2 θ1
d  1 m1 +m2   
θ 2
 =
  0 0 − m1 l 2 l m2 m1  θ2  ,
2
(C6)
dt p1  −(m1 + m2 )gl 0 0 0  p1 
p2 0 −m2 gl 0 0 p2

    
θ1 1 0 0 0 x1
θ̇1  0 m1l2 0 − m11 l2   p1 
 = 1  , (C7)
θ2  0 0 1 0   x2 
m1 +m2
θ̇2 0 − m11 l2 0 m 1 m2 l
2 p2

From Eq. (E4), we can write θ1 and θ2 as linear combinatitons of two normal modes θ± (t) that oscillate inde-
pendently with frequencies ω± :

2 g h p i
ω± = m1 + m2 ± (m1 + m2 )m2 ,
m1 l
r (C8)
m1 + m2
θ1 = θ+ − θ− , θ2 = − (θ+ + θ− )
m2

2. Friedmann–Robertson–Walker (FRW) metric

The Friedmann–Robertson–Walker (FRW) metric in the spherical coordinate and in the cartesian coordiante are:

1 0 0 0
 
2
FRW
0 − a(t) 2 0 0 
gspherical (t, r, θ, φ) =  1−kr
2 2
,
0 0 −a(t) r 0 
2 2 2
0 0 0 −a(t) r sin θ

1 0 0 0
 (C9)
0 −(r2 + kx2 2 ) a(t)2 2 kxya(t)2
− 1−kr2 − kxza(t)
2

FRW 1−kr r 1−kr 2
gcartesian (t, x, y, z) =  .
 
kxya(t)2 ky 2 a(t)2 2
0 − 1−kr2 2
−(r + 1−kr 2 ) r2 − kyza(t)
1−kr 2 
2 2 2
kz 2
0 − kxza(t)
1−kr 2 − kyza(t)
1−kr 2 −(r2 + 1−kr 2 ) a(t)
r2

When k = 1 and a(t) = t, the FRW metric can be transformed to the Minkowski metric via a global transformation:

 0  √ 
t t 1 + r2
x0   tx 
 0 =   (C10)
y   ty 
0
z tz

3. Schwarzschild metric and GP coordinate

The Schwarzschild metric in spherical coordinates and in Cartesian coordinates are

1 − 2M
 
r 0 0 0
−1
Sch
 0 −(1 − 2Mr ) 0 0 
gspherical (t, r, θ, φ) =  ,
 0 0 −r2 0 
0 0 0 −r2 sin2 θ

1 − 2M 0 0 0
 (C11)
r
2M x2 2M xy 2M xz
 0 −1 − (r−2M )r2 − (r−2M )r2 − (r−2M )r2 
Sch
gcartesian (t, x, y, z) =  .
 
2M xy 2M y 2 2M yz
 0 − (r−2M )r2 −1 − (r−2M )r2 − (r−2M )r 2 
2M xz 2M yz 2M z 2
0 − (r−2M )r 2 − (r−2M )r 2 −1 − (r−2M )r 2
9

The spatial part of the metric diverges at r = 2M . However, if we transform to the Gullstrand–Painlevé coordinate
defined by
 0  u+1 
t t − 2M (−2u + ln u−1 )) r
x0   x  r
 0 =  ,u = , (C12)
y   y  2M
z0 z

the apparent singularity disappears, and we obtain a spatial metric which, remarkably, is the same as for Euclidean
space:
 q q q 
2M y
1 − 2M − 2M x
− − 2M z
 q r r r r r r r
2M x
− −1 0 0
 
Sch r r

gGP =  q
  (C13)
2M y
− 0 −1 0

r r

 q 
− 2M r r
z
0 0 −1

Appendix D: Underconstrained problems

We saw that, ironically, the only example where our symbolic regression failed to find an exact formula for the
discovered coordinate transformation was the very simplest one: the uniform motion of example A:

ẋ = p
(D1)
ṗ = 0

The reason for this is that the final equations with manifest symmetry are so simple that there are infinitely many
different transformations that produce it, and our neural network has no incentive to find the simplest one. Specifically,
any coordinate transformation of the form x0 = c1 (p)x + c2 (p) and p0 = c3 (p) preserves both translational invariance
because
ẋ0 = c1 (p)ẋ + c01 (p)xṗ + c02 (p)ṗ = c01 (p)p = c01 (c−1 0 −1 0 0
3 (p ))c3 (p ) ≡ A(p ),
(D2)
ṗ0 = c03 (p)ṗ = 0

and Hamiltonicity H = A(p0 )dp0 .


R

You might also like