Professional Documents
Culture Documents
Low-Dimensional Hyperbolic Knowledge Graph Embeddings
Low-Dimensional Hyperbolic Knowledge Graph Embeddings
Low-Dimensional Hyperbolic Knowledge Graph Embeddings
Ines Chami1∗, Adva Wolf1 , Da-Cheng Juan2 , Frederic Sala1 , Sujith Ravi3† and Christopher Ré1
1
Stanford University
2
Google Research
3
Amazon Alexa
{chami,advaw,fredsala,chrismre}@cs.stanford.edu
dacheng@google.com
sravi@sravi.org
es
es
must be preserved in the embedding space.
ur
ur
at
at
fe
fe
For hierarchical data, hyperbolic embedding Ben married Laura Jeff Sam Dee Henry Drew
Harper Dern Goldblum Neill Wallace Thomas Barrymore
methods have shown promise for high-fidelity
and parsimonious representations. However,
Figure 1: A toy example showing how KGs can simul-
existing hyperbolic embedding methods do
taneously exhibit hierarchies and logical patterns.
not account for the rich logical patterns in
KGs. In this work, we introduce a class
of hyperbolic KG embedding models that si-
multaneously capture hierarchical and logi-
methods learn representations of entities and re-
cal patterns. Our approach combines hyper- lationships that preserve the information found in
bolic reflections and rotations with attention the graph, and have achieved promising results for
to model complex relational patterns. Exper- many tasks.
imental results on standard KG benchmarks Relations found in KGs have differing properties:
show that our method improves over previ- for example, (Michelle Obama, married to, Barack
ous Euclidean- and hyperbolic-based efforts
Obama) is symmetric, whereas hypernym relations
by up to 6.1% in mean reciprocal rank (MRR)
in low dimensions. Furthermore, we observe
like (cat, specific type of, feline), are not (Figure
that different geometric transformations cap- 1). These distinctions present a challenge to em-
ture different types of relations while attention- bedding methods: preserving each type of behavior
based transformations generalize to multiple requires producing a different geometric pattern
relations. In high dimensions, our approach in the embedding space. One popular approach
yields new state-of-the-art MRRs of 49.6% on is to use extremely high-dimensional embeddings,
WN18RR and 57.7% on YAGO3-10. which offer more flexibility for such patterns. How-
ever, given the large number of entities found in
1 Introduction
KGs, doing so yields very high memory costs.
Knowledge graphs (KGs), consisting of (head en- For hierarchical data, hyperbolic geometry of-
tity, relationship, tail entity) triples, are popular fers an exciting approach to learn low-dimensional
data structures for representing factual knowledge embeddings while preserving latent hierarchies.
to be queried and used in downstream applications Hyperbolic space can embed trees with arbitrarily
such as word sense disambiguation, question an- low distortion in just two dimensions. Recent re-
swering, and information extraction. Real-world search has proposed embedding hierarchical graphs
KGs such as Yago (Suchanek et al., 2007) or Word- into these spaces instead of conventional Euclidean
net (Miller, 1995) are usually incomplete, so a com- space (Nickel and Kiela, 2017; Sala et al., 2018).
mon approach to predicting missing links in KGs However, these works focus on embedding simpler
is via embedding into vector spaces. Embedding graphs (e.g., weighted trees) and cannot express
∗
Work partially done during an internship at Google. the diverse and complex relationships in KGs.
†
Work done while at Google AI. We propose a new hyperbolic embedding ap-
proach that captures such patterns to achieve the 2 Related Work
best of both worlds. Our proposed approach pro-
duces the parsimonious representations offered by Previous methods for KG embeddings also rely
hyperbolic space, especially suitable for hierar- on geometric properties. Improvements have been
chical relations, and is effective even with low- obtained by exploiting either more sophisticated
dimensional embeddings. It also uses rich trans- spaces (e.g., going from Euclidean to complex or
formations to encode logical patterns in KGs, pre- hyperbolic space) or more sophisticated operations
viously only defined in Euclidean space. To ac- (e.g., from translations to isometries, or to learning
complish this, we (1) train hyperbolic embeddings graph neural networks). In contrast, our approach
with relation-specific curvatures to preserve mul- takes a step forward in both directions.
tiple hierarchies in KGs; (2) parameterize hyper-
Euclidean embeddings In the past decade, there
bolic isometries (distance-preserving operations)
has been a rich literature on Euclidean embeddings
and leverage their geometric properties to capture
for KG representation learning. These include
relations’ logical patterns, such as symmetry or
translation approaches (Bordes et al., 2013; Ji et al.,
anti-symmetry; (3) and use a notion of hyperbolic
2015; Wang et al., 2014; Lin et al., 2015) or tensor
attention to combine geometric operators and cap-
factorization methods such as RESCAL (Nickel
ture multiple logical patterns.
et al., 2011) or DistMult (Yang et al., 2015). While
We evaluate the performance of our approach, these methods are fairly simple and have few pa-
ATT H, on the KG link prediction task using the rameters, they fail to encode important logical prop-
standard WN18RR (Dettmers et al., 2018; Bordes erties (e.g., translations can’t encode symmetry).
et al., 2013), FB15k-237 (Toutanova and Chen,
2015) and YAGO3-10 (Mahdisoltani et al., 2013) Complex embeddings Recently, there has been
benchmarks. (1) In low (32) dimensions, we im- interest in learning embeddings in complex space,
prove over Euclidean-based models by up to 6.1% as in the ComplEx (Trouillon et al., 2016) and Ro-
in the mean reciprocical rank (MRR) metric. In par- tatE (Sun et al., 2019) models. RotatE learns ro-
ticular, we find that hierarchical relationships, such tations in complex space, which are very effective
as WordNet’s hypernym and member meronym, sig- in capturing logical properties such as symmetry,
nificantly benefit from hyperbolic space; we ob- anti-symmetry, composition or inversion. The re-
serve a 16% to 24% relative improvement versus cent QuatE model (Zhang et al., 2019) learns KG
Euclidean baselines. (2) We find that geometric embeddings using quaternions. However, a down-
properties of hyperbolic isometries directly map to side is that these embeddings require very high-
logical properties of relationships. We study sym- dimensional spaces, leading to high memory costs.
metric and anti-symmetric patterns and find that
reflections capture symmetric relations while rota- Deep neural networks Another family of meth-
tions capture anti-symmetry. (3) We show that ods uses neural networks to produce KG embed-
attention based-transformations have the ability dings. For instance, R-GCN (Schlichtkrull et al.,
to generalize to multiple logical patterns. For in- 2018) extends graph neural networks to the multi-
stance, we observe that ATT H recovers reflections relational setting by adding a relation-specific ag-
for symmetric relations and rotations for the anti- gregation step. ConvE and ConvKB (Dettmers
symmetric ones. et al., 2018; Nguyen et al., 2018) leverage the ex-
pressiveness of convolutional neural networks to
In high (500) dimensions, we find that both hy- learn entity embeddings and relation embeddings.
perbolic and Euclidean embeddings achieve similar More recently, the KBGAT (Nathani et al., 2019)
performance, and our approach achieves new state- and A2N (Bansal et al., 2019) models use graph
of-the-art results (SotA), obtaining 49.6% MRR attention networks for knowledge graph embed-
on WN18RR and 57.7% YAGO3-10. Our exper- dings. A downside of these methods is that they
iments show that trainable curvature is critical to are computationally expensive as they usually re-
generalize hyperbolic embedding methods to high- quire pre-trained KG embeddings as input for the
dimensions. Finally, we visualize embeddings neural network.
learned in hyperbolic spaces and show that hyper-
bolic geometry effectively preserves hierarchies in Hyperbolic embeddings To the best of our
KGs. knowledge, MuRP (Balažević et al., 2019) is the
only method that learns KG embeddings in hy- Tx M
perbolic space in order to target hierarchical data.
MuRP minimizes hyperbolic distances between x
v
a re-scaled version of the head entity embedding
expx(v)
and a translation of the tail entity embedding. It M
achieves promising results using hyperbolic em-
beddings with fewer dimensions than its Euclidean
analogues. However, MuRP is a translation model Figure 2: An illustration of the exponential map
and fails to encode some logical properties of rela- expx (v), which maps the tangent space Tx M at the
tionships. Furthermore, embeddings are learned in point x to the hyperbolic manifold M .
a hyperbolic space with fixed curvature, potentially
leading to insufficient precision, and training relies dimensional Poincaré ball model with negative cur-
on cumbersome Riemannian optimization. Instead, vature −c (c > 0): Bd,c = {x ∈ Rd : ||x||2 < 1c },
our proposed method leverages expressive hyper- where || · || denotes the L2 norm. For each point
bolic isometries to simultaneously capture logical x ∈ Bd,c , the tangent space Txc is a d-dimensional
patterns and hierarchies. Furthermore, embeddings vector space containing all possible directions of
are learned using tangent space (i.e., Euclidean) op- paths in Bd,c leaving from x.
timization methods and trainable hyperbolic curva- The tangent space Txc maps to Bd,c via the ex-
tures per relationship, avoiding precision errors that ponential map (Figure 2), and conversely, the log-
might arise when using a fixed curvature, and pro- arithmic map maps Bd,c to Txc . In particular, we
viding flexibility to encode multiple hierarchies. have closed-form expressions for these maps at the
origin:
3 Problem Formulation and Background
√ v
We describe the KG embedding problem setting expc0 (v) = tanh( c||v||) √ , (1)
c||v||
and give some necessary background on hyperbolic √ y
geometry. logc0 (y) = arctanh( c||y||) √ . (2)
c||y||
3.1 Knowledge graph embeddings Vector addition is not well-defined in the hyper-
bolic space (adding two points in the Poincaré ball
In the KG embedding problem, we are given a set
might result in a point outside the ball). Instead,
of triples (h, r, t) ∈ E ⊆ V × R × V, where V and
Möbius addition ⊕c (Ganea et al., 2018) provides
R are entity and relationship sets, respectively. The
an analogue to Euclidean addition for hyperbolic
goal is to map entities v ∈ V to embeddings ev ∈
space. We give its closed-form expression in Ap-
U dV and relationships r ∈ R to embeddings rr ∈
pendix A.1. Finally, the hyperbolic distance on
U dR , for some choice of space U (traditionally R),
Bd,c has the explicit formula:
such that the KG structure is preserved.
Concretely, the data is split into ET rain and ET est 2 √
dc (x, y) = √ arctanh( c|| − x ⊕c y||). (3)
triples. Embeddings are learned by optimizing a c
scoring function s : V × R × V → R, which
measures triples’ likelihoods. s(·, ·, ·) is trained 4 Methodology
using triples in ET rain and the learned embeddings The goal of this work is to learn parsimonious hy-
are then used to predict scores for triples in ET est . perbolic embeddings that can encode complex log-
The goal is to learn embeddings such that the scores ical patterns such as symmetry, anti-symmetry, or
of triples in ET est are high compared to triples that inversion while preserving latent hierarchies. Our
are not present in E. model, ATT H, (1) learns KG embeddings in hyper-
bolic space in order to preserve hierarchies (Sec-
3.2 Hyperbolic geometry tion 4.1), (2) uses a class of hyperbolic isometries
We briefly review key notions from hyperbolic ge- parameterized by compositions of Givens transfor-
ometry; a more in-depth treatment is available in mations to encode logical patterns (Section 4.2),
standard texts (Robbin and Salamon). Hyperbolic (3) combines these isometries with hyperbolic at-
geometry is a non-Euclidean geometry with con- tention (Section 4.3). We describe the full model
stant negative curvature. In this work, we use the d- in Section 4.4.
4.1 Hierarchies in hyperbolic space
0 0
As described, hyperbolic embeddings enable us 0 0
Table 2: Link prediction results for low-dimensional embeddings (d = 32) in the filtered setting. Best score in bold
and best published underlined. Hyperbolic isometries significantly outperform Euclidean baselines on WN18RR
and YAGO3-10, both of which exhibit hierarchical structures.
Mean Reciprocical Rank (MRR) vs. dimension Evaluation metrics At test time, we use the scor-
0.50
ing function in Equation 10 to rank the correct tail
0.45
or head entity against all possible entities, and use
0.40 in use inverse relations for head prediction (Lacroix
MRR
Table 3: Comparison of H@10 for WN18RR relations. Table 4: Comparison of geometric transformations on
Higher KhsG and lower ξG means more hierarchical. a subset of YAGO3-10 relations.
perparameters. Our best model hyperparameters (KhsG ) (Balažević et al., 2019) and estimated cur-
are detailed in Appendix A.3. We conducted all vature per relation (see Appendix A.2 for more
our experiments on NVIDIA Tesla P100 GPUs and details). We consider a relation to be hierarchical
make our implementation publicly available∗ . when its corresponding graph is close to tree-like
(low curvature, high KhsG ). We observe that hyper-
5.2 Results in low dimensions bolic embeddings offer much better performance
We first evaluate our approach in the low- on hierarchical relations such as hypernym or has
dimensional setting for d = 32, which is approxi- part, while Euclidean and hyperbolic embeddings
mately one order of magnitude smaller than SotA have similar performance on non-hierarchical rela-
Euclidean methods. Table 2 compares the perfor- tions such as verb group. We also plot the learned
mance of ATT H to that of other baselines, includ- curvature per relation versus the embedding dimen-
ing the recent hyperbolic (but not rotation-based) sion in Figure 5b. We note that the learned curva-
MuRP model. In low dimensions, hyperbolic ture in low dimensions directly correlates with the
embeddings offer much better representations for estimated graph curvature ξG in Table 3, suggesting
hierarchical relations, confirming our hypothesis. that the model with learned curvatures learns more
ATT H improves over previous Euclidean and hy- “curved” embedding spaces for tree-like relations.
perbolic methods by 0.7% and 6.1% points in MRR Finally, we observe that MurP achieves lower
on WN18RR and YAGO3-10 respectively. Both performance than MurE on YAGO3-10, while
datasets have multiple hierarchical relationships, ATT H improves over ATT E by 2.3% in MRR. This
suggesting that the hierarchical structure imposed suggests that trainable curvature is critical to learn
by hyperbolic geometry leads to better embeddings. embeddings with the right amount of curvature,
On FB15k-237, ATT H and MurP achieve similar while fixed curvature might degrade performance.
performance, both improving over Euclidean base- We elaborate further on this point in Section 5.5.
lines. We conjecture that translations are sufficient
to model relational patterns in FB15k-237. 5.3 Hyperbolic rotations and reflections
To understand the role of dimensionality, we In our experiments, we find that rotations work well
also conduct experiments on WN18RR against on WN18RR, which contains multiple hierarchi-
SotA methods under varied low-dimensional set- cal and anti-symmetric relations, while reflections
tings (Figure 4). We include error bars for our work better for YAGO3-10 (Table 5). To better
method with average MRR and standard deviation understand the mechanisms behind these observa-
computed over 10 runs. Our approach consistently tions, we analyze two specific patterns: relation
outperforms all baselines, suggesting that hyper- symmetry and anti-symmetry. We report perfor-
bolic embeddings still attain high-accuracy across mance per-relation on a subset of YAGO3-10 re-
a broad range of dimensions. lations in Table 4. We categorize relations into
Additionally, we measure performance per re- symmetric, anti-symmetric, or neither symmetric
lation on WN18RR in Table 3 to understand the nor anti-symmetric categories using data statistics.
benefits of hyperbolic geometric on hierarchical re- More concretely, we consider a relation to satisfy a
lations. We report the Krackhardt hierarchy score logical pattern when the logical condition is satis-
∗
fied by most of the triplets (e.g., a relation r is sym-
Code available at https://github.com/
tensorflow/neural-structured-learning/ metric if for most KG triples (h, r, t), (t, r, h) is
tree/master/research/kg_hyp_emb also in the KG). We observe that reflections encode
0.6 Mean Reciprocal Rank (MRR) vs. dimension 3.0 Absolute curvature per relation vs. dimension
hypernym instance hypernym
Low dimensions High dimensions
0.5 2.5 synset domain topic of member meronym
Absolute curvature
has part member of domain usage
2.0 derivationally related form similar to
0.4 also see member of domain region
1.5
MRR
verb group
0.3
1.0
0.2 RotE (Zero curvature) 0.5
0.1 RotH (Fixed curvature)
0.0
RotH (Trainable curvatures)
0.0 1 2 3 −0.5
10 10 10 101 102
Embedding dimension Embedding dimension
(a) MRR for fixed and trainable curvatures on WN18RR. (b) Curvatures learned by with ROT H on WN18RR.
Figure 5: (a): ROT H offers improved performance in low dimensions; in high dimensions, fixed curvature degrades
performance, while trainable curvature approximately recovers Euclidean space. (b): As the dimension increases,
the learned curvature of hierarchical relationships tends to zero.
symmetric relations particularly well, while rota- both Euclidean and hyperbolic spaces have enough
tions are well suited for anti-symmetric relations. capacity to represent complex hierarchies in KGs.
This confirms our intuition—and the motivation for This is further supported by Figure 5b, which
our approach—that particular geometric properties shows the learned absolute curvature versus the
capture different kinds of logical properties. dimension. We observe that curvatures are close to
zero in high dimensions, confirming our expecta-
5.4 Attention-based transformations tion that ROT H with trainable curvatures learns a
One advantage of using relation-specific transfor- roughly Euclidean geometry in this setting.
mations is that each relation can learn the right In contrast, fixed curvature degrades perfor-
geometric operators based on the logical properties mance in high dimensions (Figure 5a), confirming
it has to satisfy. In particular, we observe that in the importance of trainable curvatures and its im-
both low- and high-dimensional settings, attention- pact on precision and capacity (previously studied
based models can recover the performance of the by (Sala et al., 2018)). Additionally, we show the
best transformation on all datasets (Tables 2 and 5). embeddings’ norms distribution in the Appendix
Additionally, per-relationship results on YAGO3- (Figure 7). Fixed curvature results in embeddings
10 in Table 4 suggest that ATT H indeed recovers being clustered near the boundary of the ball while
the best geometric operation. trainable curvatures adjusts the embedding space
Furthermore, for relations that are neither sym- to better distribute points throughout the ball. Pre-
metric nor anti-symmetric, we find that ATT H cision issues that might arise with fixed curvature
can outperform rotations and reflections, suggest- could also explain MurP’s low performance in high
ing that combining multiple operators with atten- dimensions. Trainable curvatures allow ROT H to
tion can learn more expressive operators to model perform as well or better than previous methods in
mixed logical patterns. In other words, attention- both low and high dimensions.
based transformations alleviate the need to conduct
experiments with multiple geometric transforma- 5.6 Visualizations
tions by simply allowing the model to choose which In Figure 6, we visualize the embeddings learned
one is best for a given relation. by ROT E versus ROT H for a sub-tree of the or-
ganism entity in WN18RR. To better visualize the
5.5 Results in high dimensions hierarchy, we apply k inverse rotations for all nodes
In high dimensions (Table 5), we compare against at level k in the tree.
a variety of other models and achieve new SotA By contrast to ROT E, ROT H preserves the tree
results on WN18RR and YAGO3-10, and third- structure in the embedding space. Furthermore, we
best results on FB15k-237. As we expected, when note that ROT E cannot simultaneously preserve the
the embedding dimension is large, Euclidean and tree structure and make non-neighboring nodes far
hyperbolic embedding methods perform similarly from each other. For instance, virus should be far
across all datasets. We explain this behavior by not- from male, but preserving the tree structure (by
ing that when the dimension is sufficiently large, going one level down in the tree) while making
WN18RR FB15k-237 YAGO3-10
U Model MRR H@1 H@3 H@10 MRR H@1 H@3 H@10 MRR H@1 H@3 H@10
DistMult .430 .390 .440 .490 .241 .155 .263 .419 .340 .240 .380 .540
ConvE .430 .400 .440 .520 .325 .237 .356 .501 .440 .350 .490 .620
Rd
TuckER .470 .443 .482 .526 .358 .266 .394 .544 - - - -
MurE .475 .436 .487 .554 .336 .245 .370 .521 .532 .444 .584 .694
ComplEx-N3 .480 .435 .495 .572 .357 .264 .392 .547 .569 .498 .609 .701
Cd
RotatE .476 .428 .492 .571 .338 .241 .375 .533 .495 .402 .550 .670
Hd Quaternion .488 .438 .508 .582 .348 .248 .382 .550 - - - -
Bd,1 MurP .481 .440 .495 .566 .335 .243 .367 .518 .354 .249 .400 567
R EF E .473 .430 .485 .561 .351 .256 .390 .541 .577 .503 .621 .712
Rd ROT E .494 .446 .512 .585 .346 .251 .381 .538 .574 .498 .621 .711
ATT E .490 .443 .508 .581 .351 .255 .386 .543 .575 .500 .621 .709
R EF H .461 .404 .485 .568 .346 .252 .383 .536 .576 .502 .619 .711
Bd,c ROT H .496 .449 .514 .586 .344 .246 .380 .535 .570 .495 .612 .706
ATT H .486 .443 .499 .573 .348 .252 .384 .540 .568 .493 .612 .702
Table 5: Link prediction results for high-dimensional embeddings (best for d ∈ {200, 400, 500}) in the filtered
setting. DistMult, ConvE and ComplEx results are taken from (Dettmers et al., 2018). Best score in bold and
best published underlined. ATT E and ATT H have similar performance in the high-dimensional setting, performing
competitively with or better than state-of-the-art methods on WN18RR, FB15k-237 and YAGO3-10.
virus
chical structures. Future directions for this work in-
protoctist
microbe
virus
microbe
protoctist
microorganism
clude exploring other tasks that might benefit from
microorganism
larva
young
potplant organism
hyperbolic geometry, such as hypernym detection.
potplant
plantlife
poisonousplant
poisonousplant
young
fauna The proposed attention-based transformations can
plantlife
male
female
larva
female
also be extended to other geometric operations.
fauna male
organism
Acknowledgements
(a) ROT E embeddings. (b) ROT H embeddings.
We thank Avner May for their helpful feedback
Figure 6: Visualizations of the embeddings learned by
and discussions. We gratefully acknowledge the
ROT E and ROT H on a sub-tree of WN18RR for the hy-
pernym relation. In contrast to ROT E, ROT H preserves support of DARPA under Nos. FA86501827865
hierarchies by learning tree-like embeddings. (SDH) and FA86501827882 (ASED); NIH under
No. U54EB020405 (Mobilize), NSF under Nos.
CCF1763315 (Beyond Sparsity), CCF1563078
these two nodes far from each other is difficult in (Volume to Velocity), and 1937301 (RTML); ONR
Euclidean space. In hyperbolic space, however, we under No. N000141712266 (Unifying Weak Super-
observe that going one level down in the tree is vision); the Moore Foundation, NXP, Xilinx, LETI-
achieved by translating embeddings towards the CEA, Intel, IBM, Microsoft, NEC, Toshiba, TSMC,
left. This pattern essentially illustrates the transla- ARM, Hitachi, BASF, Accenture, Ericsson, Qual-
tion component in ROT H, allowing the model to comm, Analog Devices, the Okawa Foundation,
simultaneously preserve hierarchies while making American Family Insurance, Google Cloud, Swiss
non-neighbouring nodes far from each other. Re, the HAI-AWS Cloud Credits for Research
program, TOTAL, and members of the Stanford
6 Conclusion
DAWN project: Teradata, Facebook, Google, Ant
We introduce ATT H, a hyperbolic KG embed- Financial, NEC, VMWare, and Infosys. The U.S.
ding model that leverages the expressiveness of Government is authorized to reproduce and dis-
hyperbolic space and attention-based geometric tribute reprints for Governmental purposes notwith-
transformations to learn improved KG representa- standing any copyright notation thereon. Any opin-
tions in low-dimensions. ATT H learns embeddings ions, findings, and conclusions or recommenda-
with trainable hyperbolic curvatures, allowing it tions expressed in this material are those of the
to learn the right geometry for each relationship authors and do not necessarily reflect the views,
and generalize across multiple embedding dimen- policies, or endorsements, either expressed or im-
sions. ATT H achieves new SotA on WN18RR and plied, of DARPA, NIH, ONR, or the U.S. Govern-
YAGO3-10, real-world KGs which exhibit hierar- ment.
References David Krackhardt. 1994. Graph theoretical dimensions
of informal organizations. In Computational organi-
Ivana Balažević, Carl Allen, and Timothy Hospedales. zation theory, pages 107–130. Psychology Press.
2019. Multi-relational poincaré graph embeddings.
In Advances in Neural Information Processing Sys- Timothée Lacroix, Nicolas Usunier, and Guillaume
tems, pages 4465–4475. Obozinski. 2018. Canonical tensor decomposition
for knowledge base completion. International Con-
Ivana Balazevic, Carl Allen, and Timothy Hospedales. ference on Machine Learning.
2019. Tucker: Tensor factorization for knowledge
graph completion. In Proceedings of the 2019 Con- Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu,
ference on Empirical Methods in Natural Language and Xuan Zhu. 2015. Learning entity and relation
Processing and the 9th International Joint Confer- embeddings for knowledge graph completion. In
ence on Natural Language Processing (EMNLP- Twenty-ninth AAAI Conference on Artificial Intelli-
IJCNLP), pages 5188–5197. gence.
Trapit Bansal, Da-Cheng Juan, Sujith Ravi, and An- Qi Liu, Maximilian Nickel, and Douwe Kiela. 2019.
drew McCallum. 2019. A2n: Attending to neigh- Hyperbolic graph neural networks. In Advances
bors for knowledge graph inference. In Proceedings in Neural Information Processing Systems, pages
of the 57th Annual Meeting of the Association for 8228–8239.
Computational Linguistics, pages 4387–4392.
Farzaneh Mahdisoltani, Joanna Biega, and Fabian M
Silvere Bonnabel. 2013. Stochastic gradient descent
Suchanek. 2013. Yago3: A knowledge base from
on Riemannian manifolds. IEEE Transactions on
multilingual wikipedias.
Automatic Control, 58(9):2217–2229.
George A Miller. 1995. Wordnet: a lexical database for
Antoine Bordes, Nicolas Usunier, Alberto Garcia-
english. Communications of the ACM, 38(11):39–
Duran, Jason Weston, and Oksana Yakhnenko.
41.
2013. Translating embeddings for modeling multi-
relational data. In Advances in Neural Information Deepak Nathani, Jatin Chauhan, Charu Sharma, and
Processing Systems, pages 2787–2795. Manohar Kaul. 2019. Learning attention-based
Ines Chami, Zhitao Ying, Christopher Ré, and Jure embeddings for relation prediction in knowledge
Leskovec. 2019. Hyperbolic graph convolutional graphs. In Proceedings of the 57th Annual Meet-
neural networks. In Advances in Neural Information ing of the Association for Computational Linguistics.
Processing Systems, pages 4869–4880. Association for Computational Linguistics.
Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc
and Sebastian Riedel. 2018. Convolutional 2D Nguyen, and Dinh Phung. 2018. A Novel Embed-
knowledge graph embeddings. In Thirty-Second ding Model for Knowledge Base Completion Based
AAAI Conference on Artificial Intelligence. on Convolutional Neural Network. In Proceed-
ings of the 16th Annual Conference of the North
John Duchi, Elad Hazan, and Yoram Singer. 2011. American Chapter of the Association for Computa-
Adaptive subgradient methods for online learning tional Linguistics: Human Language Technologies
and stochastic optimization. Journal of Machine (NAACL-HLT), pages 327–333.
Learning Research, 12(Jul):2121–2159.
Maximilian Nickel, Volker Tresp, and Hans-Peter
Octavian Ganea, Gary Bécigneul, and Thomas Hof- Kriegel. 2011. A three-way model for collective
mann. 2018. Hyperbolic neural networks. In Ad- learning on multi-relational data. In International
vances in Neural Information Processing Systems. Conference on Machine Learning, pages 809–816.
Omnipress.
Albert Gu, Fred Sala, Beliz Gunel, and Christopher
Ré. 2019. Learning mixed-curvature representations Maximillian Nickel and Douwe Kiela. 2017. Poincaré
in product spaces. In International Conference on embeddings for learning hierarchical representa-
Learning Representations. tions. In Advances in Neural Information Process-
ing Systems, pages 6338–6347.
Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and
Jun Zhao. 2015. Knowledge graph embedding via Joel W Robbin and Dietmar A Salamon. Introduction
dynamic mapping matrix. In Proceedings of the to differential geometry.
53rd Annual Meeting of the Association for Compu-
tational Linguistics and the 7th International Joint Frederic Sala, Chris De Sa, Albert Gu, and Christopher
Conference on Natural Language Processing (Vol- Ré. 2018. Representation tradeoffs for hyperbolic
ume 1: Long Papers), pages 687–696. embeddings. In International Conference on Ma-
chine Learning, pages 4457–4466.
Diederik P Kingma and Jimmy Ba. 2015. Adam: A
method for stochastic optimization. In International Michael Schlichtkrull, Thomas N Kipf, Peter Bloem,
Conference for Learning Representations. Rianne Van Den Berg, Ivan Titov, and Max Welling.
2018. Modeling relational data with graph convolu-
tional networks. In European Semantic Web Confer-
ence, pages 593–607. Springer.
Fabian M Suchanek, Gjergji Kasneci, and Gerhard
Weikum. 2007. Yago: a core of semantic knowledge.
In Proceedings of the 16th international conference
on World Wide Web, pages 697–706. ACM.
Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian
Tang. 2019. Rotate: Knowledge graph embedding
by relational rotation in complex space. In Interna-
tional Conference on Learning Representations.
Alexandru Tifrea, Gary Bécigneul, and Octavian-
Eugen Ganea. 2019. Poincaré GloVe: Hyperbolic
word embeddings. In International Conference on
Learning Representations.
Kristina Toutanova and Danqi Chen. 2015. Observed
versus latent features for knowledge base and text
inference. In Proceedings of the 3rd Workshop on
Continuous Vector Space Models and their Compo-
sitionality, pages 57–66.
is the standard Euclidean addition. Analogously, in the component ci,r . ξGr is the mean of the es-
the Möbius addition satisfies (Ganea et al., 2018): timated curvatures of the sampled triangles. For
x ⊕c y = expcx (P0→x
c (logc0 (y))). the full graph, we take the weighted average of the
relation curvatures ξGr with respect to the weights
Pmr 3
i=1 Ni,r
A.2 Hierarchy estimates P P mr
N 3 .
r i=1 i,r
Curvature estimate To estimate the curvature See (Krackhardt, 1994) for more details. We
of a relation r, we restrict to the undirected graph note that for fully observed symmetric relations
Gr spanned by the edges labeled as r. Following (each edge is in a two-edge loop), KhsGr = 0
(Gu et al., 2019), let ξGr (a, b, c) be the curvature while for anti-symmetric relations (no small loops),
estimate of a triangle in Gr with vertices {a, b, c}, KhsGr = 1.
ξG < 0, KhsG = 1 ξG < 0, KhsG = 0
ξG = 0, KhsG = 1 ξG = 0, KhsG = 0
Figure 8: The curvature estimate ξG and the Krackhardt hierarchy score KhsG for several simple graphs. The
top-left graph is the most hierarchical, while the bottom-right graph is the least hierarchical.
Table 7: Best hyperparameters in low- and high-dimensional settings. NA negative samples indicates that the full
cross-entropy loss is used, without negative sampling.
{(Θr , Φr , rE E
r , ar , cr )r∈R , (ev , bv )v∈V }, which are
all Euclidean parameters that can be learned using
standard Euclidean optimization techniques.