Full Ebook of Embedding Knowledge Graphs With Rdf2Vec 1St Edition Heiko Paulheim Online PDF All Chapter

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 69

Embedding Knowledge Graphs with

RDF2vec 1st Edition Heiko Paulheim


Visit to download the full and correct content document:
https://ebookmeta.com/product/embedding-knowledge-graphs-with-rdf2vec-1st-editio
n-heiko-paulheim/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Knowledge Graphs Applied - MEAP V02 Alessandro Negro

https://ebookmeta.com/product/knowledge-graphs-applied-
meap-v02-alessandro-negro/

Building Knowledge Graphs: A Practitioner's Guide 1st


Edition Jesus Barrasa

https://ebookmeta.com/product/building-knowledge-graphs-a-
practitioners-guide-1st-edition-jesus-barrasa/

Knowledge Graphs: Data in Context for Responsive


Businesses Jesus Barrasa

https://ebookmeta.com/product/knowledge-graphs-data-in-context-
for-responsive-businesses-jesus-barrasa/

Building Knowledge Graphs: A Practitioner’s Guide 1st


Edition Jesus Barrasa

https://ebookmeta.com/product/building-knowledge-graphs-a-
practitioners-guide-1st-edition-jesus-barrasa-2/
The Knowledge Manager s Handbook A Step by Step Guide
to Embedding Effective Knowledge Management in your
Organization 2nd Edition Milton

https://ebookmeta.com/product/the-knowledge-manager-s-handbook-a-
step-by-step-guide-to-embedding-effective-knowledge-management-
in-your-organization-2nd-edition-milton/

Building Knowledge Graphs A Practitioner s Guide 1 /


converted Edition Jesus Barrasa Jim Webber

https://ebookmeta.com/product/building-knowledge-graphs-a-
practitioner-s-guide-1-converted-edition-jesus-barrasa-jim-
webber/

Building Knowledge Graphs (6th Early Release) Sixth


Early Release: 2023-04-24 Edition Jesus Barrasa

https://ebookmeta.com/product/building-knowledge-graphs-6th-
early-release-sixth-early-release-2023-04-24-edition-jesus-
barrasa/

Provenance in Data Science From Data Models to Context


Aware Knowledge Graphs Leslie F. Sikos

https://ebookmeta.com/product/provenance-in-data-science-from-
data-models-to-context-aware-knowledge-graphs-leslie-f-sikos/

Impactful Data Visualization: Hide and Seek with Graphs


1st Edition Kavitha Ranganathan

https://ebookmeta.com/product/impactful-data-visualization-hide-
and-seek-with-graphs-1st-edition-kavitha-ranganathan/
Synthesis Lectures on
Data, Semantics, and Knowledge

Heiko Paulheim · Petar Ristoski ·


Jan Portisch

Embedding
Knowledge Graphs
with RDF2vec
Synthesis Lectures on Data, Semantics, and
Knowledge

Series Editors
Ying Ding, The University of Texas at Austin, Austin, USA
Paul Groth, Amsterdam, Noord-Holland, The Netherlands
This series focuses on the pivotal role that data on the web and the emergent technologies
that surround it play both in the evolution of the World Wide Web as well as applications
in domains requiring data integration and semantic analysis. The large-scale availability of
both structured and unstructured data on the Web has enabled radically new technologies
to develop. It has impacted developments in a variety of areas including machine learning,
deep learning, semantic search, and natural language processing. Knowledge and seman-
tics are a critical foundation for the sharing, utilization, and organization of this data. The
series aims both to provide pathways into the field of research and an understanding of
the principles underlying these technologies for an audience of scientists, engineers, and
practitioners.
Heiko Paulheim · Petar Ristoski · Jan Portisch

Embedding Knowledge
Graphs with RDF2vec
Heiko Paulheim Petar Ristoski
University of Mannheim eBay (United States)
Mannheim, Germany San Jose, CA, USA

Jan Portisch
SAP SE
Walldorf, Germany

ISSN 2691-2023 ISSN 2691-2031 (electronic)


Synthesis Lectures on Data, Semantics, and Knowledge
ISBN 978-3-031-30386-9 ISBN 978-3-031-30387-6 (eBook)
https://doi.org/10.1007/978-3-031-30387-6

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole
or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage
and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or
hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does
not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective
laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give
a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that
may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

Knowledge graphs are an important ingredient in today’s artificial intelligence systems.


They provide a means to encode arbitrary knowledge to be processed in those AI systems,
allowing an interpretation of that knowledge both for humans and machines. Today, there
are large-scale open knowledge graphs, likeWikidata or DBpedia, as well as privately
owned knowledge graphs in organizations, e.g., the Google knowledge graph used in the
Google search engine.
Knowledge graph embedding is a technique which projects entities and relations in a
knowledge graph into a continuous vector space. Many other components of AI systems,
especially machine learning components, can work with those continuous representations
better than operating on the graph itself, and often yield superior result quality compared
to those trying to extract non-continuous features from a graph.
RDF2vec is a knowledge graph embedding approach which was invented in the scope
of the Mine@LOD project1 and has evolved since then, which has led to numerous vari-
ants of the original approach. There exist different implementations of the approach.
Moreover, the Web page rdf2vec.org2 collects far more than 60 applications of
RDF2vec to a large variety of problems in a number of domains, ranging from NLP appli-
cations like information retrieval to improving computer security by utilizing a knowledge
graph of security threats.
With this book, we want to give a gentle introduction to the idea of knowledge graph
embeddings with RDF2vec. We discuss the different variants that exist, including their
advantages and disadvantages, and give examples for using RDF2vec in practice.
Heiko would like to thank all the researchers in his team at the University of
Mannheim, i.e., Andreea Iana, Antonis Klironomos Franz Krause, Martin Böckling,
Michael Schlechtinger, Nicolas Heist, Sven Hertling, and Tobias Weller, as well as Rita
Sousa from Universidade de Lisboa, who worked on a few interesting extensions for
RDF2vec during her research stay in Mannheim. Moreover, all students who worked with
RDF2vec and provided valuable input and feedback, i.e., Alexander Lütke, Niclas Heilig,
MichaelVoit, Angelos Loucas, Rouven Grenz, and Siraj Sheikh Afham Uddin. Finally,

1 https://gepris.dfg.de/gepris/projekt/238007641.
2 http://www.rdf2vec.org/.
v
vi Preface

my partner Tine for bearing many unsolicited dinner table monologues on graphs, vec-
tors, and stuff, and my daughter Antonia for luring me away from graphs, vectors, and
stuff every once in a while.
Jan would like to thank all researchers from the University of Mannheim who par-
ticipated in lengthy discussions on RDF2vec and graph embeddings, particularly Sven
Hertling, Nicolas Heist, and Andreea Iana. In addition, Jan is grateful for interest-
ing exchanges at SAP, especially with Michael Hladik, Guilherme Costa, and Michael
Monych. Lastly, Jan would like to thank his partner, Isabella, and his best friend, Sophia,
for a continued support in his private life.
Petar would like to thank Heiko Paulheim, Christian Bizer, and Simone Paolo Ponzetto,
for making this work possible in the first place, by forming an ideal environment for
conducting research at the University of Mannheim. Michael Cochez for the insight-
ful discussions and collaboration on extending RDF2vec in many directions. Anna Lisa
Gentile for the collaboration on applying RDF2vec in several research projects during my
time in IBM Research. Finally, my wife, Könül, and my daughter, Ada, for the non-latent,
high-dimensional support and love.
Moreover, the authors would like to thank the developers of pyRDF2vec and the Python
KG extension, which we used for examples in this book, especially Gilles Vandewielle
for quick responses on all issues around pyRDF2vec, and GEval, which has been used
countless times in evaluations. Moreover, we would like to thank all people involved in
the experiments shown in this book: Ahmad Al Taweel, Andreea Iana, Michael Cochez,
and all people who got their hands dirty with RDF2vec.
Finally, we would like to thank the series editors, Paul Groth and Ying Ding, for
inviting us to create this book and providing us with this unique opportunity, and Ambrose
Berkumans, Ben Ingraham, Charles Glaser, and Susanne Filler at Springer Nature for their
support throughout the production of this book.

Mannheim, Germany Heiko Paulheim


Walldorf, Germany Jan Portisch
San Jose, USA Petar Ristoski
February 2023
Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 What is a Knowledge Graph? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 A Short Bit of History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3 General-Purpose Knowledge Graphs . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Feature Extraction from Knowledge Graphs . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Node Classification in RDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 From Word Embeddings to Knowledge Graph Embeddings . . . . . . . . . . . . . . 17
2.1 Word Embeddings with word2vec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Representing Graphs as Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Learning Representations from Graph Walks . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4 Software Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5 Node Classification with RDF2vec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 Benchmarking Knowledge Graph Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1 Node Classification with Internal Labels—SW4ML . . . . . . . . . . . . . . . . . . . 31
3.2 Machine Learning with External Labels—GEval . . . . . . . . . . . . . . . . . . . . . 33
3.3 Benchmarking Expressivity of Embeddings—DLCC . . . . . . . . . . . . . . . . . . 36
3.3.1 DLCC Gold Standard based on DBpedia . . . . . . . . . . . . . . . . . . . . . 37
3.3.2 DLCC Gold Standard based on Synthetic Data . . . . . . . . . . . . . . . . 40
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4 Tweaking RDF2vec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1 Introducing Edge Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.1 Graph Internal Weighting Approaches . . . . . . . . . . . . . . . . . . . . . . . . 46
4.1.2 Graph External Weighting Approaches . . . . . . . . . . . . . . . . . . . . . . . 51
vii
viii Contents

4.2 Order-Aware RDF2vec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54


4.2.1 Motivation and Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.3 Order-Aware RDF2vec in Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3 Alternative Walk Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3.1 Entity Walks and Property Walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3.2 Further Walk Extraction Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4 RDF2vec with Materialized Knowledge Graphs . . . . . . . . . . . . . . . . . . . . . . 66
4.4.1 Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.4.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.4.3 RDF2vec on Materialized Graphs in Action . . . . . . . . . . . . . . . . . . . 72
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5 RDF2vec at Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.1 Using Pre-trained Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.1.1 The KGvec2Go Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.1.2 KGvec2Go in Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2 Training Partial RDF2vec Models with RDF2vec Light . . . . . . . . . . . . . . . 79
5.2.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.2.2 RDF2vec Light in Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6 Link Prediction in Knowledge Graphs (and its Relation to RDF2vec) . . . . . . 87
6.1 A Brief Survey on the Knowledge Graph Embedding Landscape . . . . . . . 87
6.2 Knowledge Graph Embedding for Data Mining . . . . . . . . . . . . . . . . . . . . . . 91
6.2.1 Data Mining is Based on Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.2.2 How RDF2vec Projects Similar Instances Close to Each
Other . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.2.3 Using RDF2vec for Link Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.2.4 Link Prediction with RDF2vec in Action . . . . . . . . . . . . . . . . . . . . . 98
6.3 Knowledge Graph Embedding Methods for Link Prediction . . . . . . . . . . . 99
6.3.1 Link Prediction is Based on Vector Operations . . . . . . . . . . . . . . . . 99
6.3.2 Usage for Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.3.3 Comparing the Two Notions of Similarity . . . . . . . . . . . . . . . . . . . . . 102
6.3.4 Link Prediction Embeddings for Data Mining in Action . . . . . . . . 103
6.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.4.1 Experiments on Data Mining Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.4.2 Experiments on Link Prediction Tasks . . . . . . . . . . . . . . . . . . . . . . . . 110
6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Contents ix

7 Example Applications Beyond Node Classification . . . . . . . . . . . . . . . . . . . . . . . 119


7.1 Recommender Systems with RDF2vec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
7.1.1 An RDF2vec-Based Movie Recommender in Less than 20
Lines of Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
7.1.2 Combining Knowledge Graph Embeddings with Other
Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.2 Ontology Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
7.2.1 Ontology Matching by Embedding Input Ontologies . . . . . . . . . . . 126
7.2.2 Ontology Matching by Embedding External Knowledge
Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.3 Further Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
7.3.1 Knowledge Graph Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
7.3.2 Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7.3.3 Information Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.3.4 Applications in the Biomedical Domain . . . . . . . . . . . . . . . . . . . . . . 137
7.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
8 Future Directions for RDF2vec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
8.1 Incorporating Information in Literals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
8.2 Exploiting Complex Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
8.3 Exploiting Ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
8.4 Dynamic and Temporal Knowledge Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 147
8.5 Extension to other Knowledge Graph Representations . . . . . . . . . . . . . . . . . 148
8.6 Standards and Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
8.7 Embeddings and Explainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

Appendix A: Datasets and Code Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155


Introduction
1

Abstract

In this chapter, the basic concept of a knowledge graph is introduced. We discuss why
knowledge graphs are important for machine learning and data mining tasks, and we
show classic feature extraction or propositionalization techniques, which are the historical
predecessor of knowledge graph embeddings, and we show how these techniques are used
for basic node classification tasks.

1.1 What is a Knowledge Graph?

The term knowledge graph (or KG for short) has been popularized by Google in 2012, when
they announced in a blog post that their search is going to be based on structured knowledge
representations in the future, not only on string similarity and keyword overlap, as done until
then.1 Generally, a knowledge graph is a mechanism in knowledge representation, where
things in the world (e.g., persons, places, or events) are represented as nodes, while their
relations (e.g., a person taking part in an event, an event happening at a place) are represented
as labeled edges between those nodes.

1.1.1 A Short Bit of History

While Google popularized the term, the idea of knowledge graphs is much older than that.
Earlier works usually used terms like knowledge base or semantic network, among others
(Ji et al. 2021). Although the exact origin of the term knowledge graph is not fully known,

1 https://blog.google/products/search/introducing-knowledge-graph-things-not/.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 1


H. Paulheim et al., Embedding Knowledge Graphs with RDF2vec, Synthesis Lectures
on Data, Semantics, and Knowledge, https://doi.org/10.1007/978-3-031-30387-6_1
2 1 Introduction

Hogan et al. (2021) have traced the term back to a paper from the 1970s (Schneider 1973).
In the Semantic Web community and the Linked Open Data (Bizer et al. 2011) movement,
researchers have been producing datasets that would follow the idea of a knowledge graph
for decades.
In addition to open knowledge graphs created by the research community, and the already
mentioned knowledge graph used by Google, also other major companies nowadays use
knowledge graphs as a central means to represent corporate knowledge. Notable examples
include, but are not limited to, eBay, Facebook, IBM, and Microsoft (Noy et al. 2019).

1.1.2 Definitions

While a lot of researchers and practitioners claim to use knowledge graphs, the field has
long lacked a common definition of the term knowledge graph. Ehrlinger and Wöß (2016)
have collected a few of the most common definitions of knowledge graphs. In particular,
they list the following definitions:

1. A knowledge graph (1) mainly describes real-world entities and their interrelations,
organized in a graph, (2) defines possible classes and relations of entities in a schema,
(3) allows for potentially interrelating arbitrary entities with each other, and (4) covers
various topical domains. (Paulheim 2017)
2. Knowledge graphs are large networks of entities, their semantic types, properties, and
relationships between entities. (Journal of Web Semantics 2014)
3. Knowledge graphs could be envisaged as a network of all kinds of things which are
relevant to a specific domain or to an organization. (Semantic Web Company 2014)
4. A Knowledge Graph [is] an RDF graph. An RDF graph consists of a set of RDF triples
where each RDF triple (s, p, o) is an ordered set of the following RDF terms: a subject
s ∈ U ∪ B, a predicate p ∈ U , and an object o ∈ U ∪ B ∪ L. An RDF term is either a
URI u ∈ U , a blank node b ∈ B, or a literal l ∈ L. (Färber et al. 2018)
5. Knowledge, in the form of facts, [which] are interrelated, and hence, recently this
extracted knowledge has been referred to as a knowledge graph. (Pujara et al. 2013)

In addition, they synthesize their own definition, i.e.:

6. A knowledge graph acquires and integrates information into an ontology and applies a
reasoner to derive new knowledge. (Ehrlinger and Wöß 2016)

In the course of this book, we will use a very minimalistic definition of a knowledge
graph. We consider a knowledge graph a graph G = (V, E) consisting of a set of entities V
(i.e., vertices in the graph), and a set of E ⊆ VxRx(V ∪ L) labeled edges, where R defines
the set of possible relation types (which can be considered edge labels), and L is a set of
1.1 What is a Knowledge Graph? 3

literals (e.g., numbers or string values). Moreover, each entity in V can have one or more
classes assigned, where C defines the set of possible classes. Further ontological constructs,
such as defining a class hierarchy or describing relations with domains and ranges, are not
considered here.
While most of the definitions above are more focusing on the contents of the knowledge
graph, we, in this book, look at knowledge graphs from a more technical perspective, since
the methods discussed in this book are not bound to a particular domain. Our definition is
therefore purely technical and does not constrain the contents of the knowledge graph in
any way.

1.1.3 General-Purpose Knowledge Graphs

While the research community has come up with a large number of knowledge graphs,
there are a few which are open, large-scale, general-purpose knowledge graphs covering a
lot of domains in reasonable depth. They are therefore interesting ingredients to artificial
intelligence applications since they are ready to use and contain background knowledge for
many different tasks at hand.
One of the earliest attempts to build a general-purpose knowledge graph was Cyc, a project
started in the 1980s (Lenat 1995). The project was initiated to build a machine-processable
collection of the essence of the world’s knowledge, using a proprietary language called Cyc.
After an investment of more than 2,000 person-years, the project, in the end, encompassed
almost 25M axioms and rules2 – which is most likely still just a tiny fraction of the world’s
knowledge.
The example of Cyc shows that having knowledge graphs built manually by modeling
experts does not really scale (Paulheim 2018). Therefore, modern approaches usually utilize
different techniques, such as crowdsourcing and/or heuristic extraction.
Crowdsourcing knowledge graphs was first explored with Freebase (Pellissier-Tanon
et al. 2016), with the goal of establishing a large community of volunteers, comparable to
Wikipedia. To that end, the schema of Freebase was kept fairly simple to lower the entrance
barrier as much as possible. Freebase was acquired by Google in 2010 and shut down in
2014.
Wikidata (Vrandečić and Krötzsch 2014) also uses a crowd editing approach. In contrast
to Cyc and Freebase, Wikidata also imports entire whole large datasets, such as several
national libraries’ bibliographies. Porting the data from Freebase to Wikidata is also a long
standing goal (Pellissier-Tanon et al. 2016).
A more efficient way of knowledge graph creation is the use of structured or semi-
structured sources. Wikipedia is a commonly used starting point for knowledge graphs such
as DBpedia (Lehmann et al. 2013) and YAGO (Suchanek et al. 2007). In these approaches,

2 https://files.gotocon.com/uploads/slides/conference_13/724/original/AI_GOTO%20Lenat
%20keynote%2030%20April%202019%20hc.pdf.
4 1 Introduction

an entity in the knowledge graph is created per page in Wikipedia, and additional axioms
are extracted from the respective Wikipedia pages using different means.
DBpedia mainly uses infoboxes in Wikipedia. Those are manually mapped to a pre-
defined ontology; both the ontology and the mapping are crowd-sourced using a Wiki and
a community of volunteers. Given those mappings, the DBpedia Extraction Framework
creates a graph in which each page in Wikipedia becomes an entity, and all values and links
in an infobox become attributes and edges in the graph.
YAGO uses a similar process but classifies instances based on the category structure and
WordNet (Miller 1995) instead of infoboxes. YAGO integrates various language editions of
Wikipedia into a single graph and represents temporal facts with meta-level statements, i.e.,
RDF reification.
CaLiGraph also uses information in categories but aims at converting them into formal
axioms using DBpedia as supervision (Heist and Paulheim 2019). Moreover, instances from
Wikipedia list pages are considered for populating the knowledge graph (Kuhn et al. 2016,
Paulheim and Ponzetto 2013). The result is a knowledge graph that is not only richly popu-
lated on the instance level but also has a large number of defining axioms for classes (Heist
and Paulheim 2020).
A similar approach to YAGO, i.e., the combination of information in Wikipedia and
WordNet, is used by BabelNet (Navigli and Ponzetto 2012). The main purpose of BabelNet
is the collection of synonyms and translations in various languages, so that this knowledge
graph is particularly well suited for supporting multi-language applications. Similarly, Con-
ceptNet (Speer and Havasi 2012) collects synonyms and translations in various languages,
integrating multiple third-party knowledge graphs itself.
DBkWik (Hertling and Paulheim 2018) uses the same codebase as DBpedia, but applies it
to a multitude of Wikis. This leads to a graph that has larger coverage and level of detail for
many long-tail entities and is highly complementary to DBpedia. However, the absence of
a central ontology and mappings, as well as the existence of duplicates across Wikis, which
might not be trivial to detect, imposes a number of data integration challenges not present
in DBpedia (Hertling and Paulheim 2022).
Another source of structured data are structured annotations in Web pages using tech-
niques such as RDFa, Microdata, and Microformats (Meusel et al. 2014). While the pure
collection of those could, in theory, already be considered a knowledge graph, that graph
would be rather disconnected and consist of a plethora of small, unconnected components
(Paulheim 2015) and would require additional cleanup for compensating irregular use of
the underlying schemas and shortcomings in the extraction (Meusel and Paulheim 2015). A
consolidated version of this data into a more connected knowledge graph has been published
under the name VoldemortKG (Tonon et al. 2016).
The extraction of a knowledge graph from semi-structured sources is considered easier
than the extraction from unstructured sources. However, the amount of unstructured data
1.2 Feature Extraction from Knowledge Graphs 5

exceeds the amount of structured data by large.3 Therefore, extracting knowledge from
unstructured sources has also been proposed.
NELL (Carlson et al. 2010) is an example of extracting a knowledge graph from free text.
NELL was originally trained with a few seed examples and continuously runs an iterative
coupled learning process. In each iteration, facts are used to learn textual patterns to detect
those facts, and patterns learned in previous iterations are used to extract new facts, which
serve as training examples in later iterations. To improve the quality, NELL has introduced
a feedback loop incorporating occasional human feedback (Pedro and Hruschka 2012).
WebIsA (Seitner et al. 2016) also extracts facts from natural language text but focuses on
the creation of a large-scale taxonomy. For each extracted fact, rich metadata are collected,
including the sources, the original sentences, and the patterns used in the extraction of a
particular fact. That metadata is exploited for computing a confidence score for each fact.
(Hertling and Paulheim 2017).
Table 1.1 depicts an overview of some of the knowledge graphs discussed above. Con-
ceptNet and WebIsA are not included, since they do not distinguish a schema and instance
level (i.e., there is no specific distinction between a class and an instance), which does not
allow for computing those metrics meaningfully. For Cyc, which is only available as a com-
mercial product today, we used the free version OpenCyc, which has been available until
2017.4
From those metrics, it can be observed that the KGs differ in size by several orders of
magnitude. The sizes range from 50,000 instances (and Voldemort) to 50 million instances
(for Wikidata), so the latter is larger by a factor of 1,000. The same holds for assertions.
Concerning the linkage degree, YAGO is much richer linked than the other graphs.

1.2 Feature Extraction from Knowledge Graphs

When using knowledge graphs in the context of intelligent applications, they are often com-
bined with some machine learning or data mining based processing (van Bekkum et al.
2021). The corresponding algorithms, however, mostly expect tabular or propositional data
as input, not graphs, hence, information from the graphs is often transformed into a propo-
sitional form first, a process called propositionalization or feature extraction (Lavrač et al.
2020, Ristoski and Paulheim 2014a).
Particularly for the combination with machine learning algorithms, it is not only important
to have entities in a particular propositional form, but that this propositional form also fulfills
some additional criteria. In particular, proximity in the feature space should – ideally – reflect
the similarity of entities.5 For example, when building a movie recommendation system, we

3 Although it is hard to trace down the provenance of that number, many sources state that 80% of
all data is structured, such as Das and Kumar (2013).
4 It is still available, e.g., at https://github.com/asanchez75/opencyc.
5 We will discuss this in detail in Chap. 3.
6 1 Introduction

Table 1.1 Basic metrics of open knowledge graphs (Heist et al. 2020)
DBpedia YAGO Wikidata BabelNet
# Instances 5,044,223 6,349,359 52,252,549 7,735,436
# Assertions 854,294,312 479,392,870 732,420,508 178,982,397
Avg. linking 21.30 48.26 6.38 0.00
degree
Median ingoing 0 0 0 0
edges
Median outgoing 30 95 10 9
edges
# Classes 760 819,292 2,356,259 6,044,564
# Relations 1355 77 6,236 22
Avg. depth of 3.51 6.61 6.43 4.11
class tree
Avg. branching 4.53 8.48 36.48 71.0
factor of class tree
Ontology SHOFD SHOIF SOD SO
complexity
Cyc NELL CaLiGraph Voldemort
# Instances 122,441 5,120,688 7,315,918 55,861
# Assertions 2,229,266 60,594,443 517,099,124 693,428
Avg. linking 3.34 6.72 1.48 0
degree
Median ingoing 0 0 0 0
edges
Median outgoing 3 0 1 5
edges
# Classes 116,821 1,187 755,963 621
# Relations 148 440 271 294
Avg. depth of 5.58 3.13 4.74 3.17
class tree
Avg. branching 5.62 6.37 4.81 5.40
factor of class tree
Ontology SHOIFD SROIF SHOD SH
complexity

would like to recommend movies that are similar to the movies someone already watched,
and when building a system for entity classification, we want similar entities to be assigned
to the same class.
1.2 Feature Extraction from Knowledge Graphs 7

In Paulheim and Fümkranz (2012), we have introduced a set of basic transformations


which can extract propositional features for an entity in a knowledge graph. Those techniques
include:

• Types: Create a binary feature for each entity type.


• Literals: Create a feature for each literal value (those may have different types, such as
numeric, string, ...).
• Relations: Create a binary or numeric feature for each ingoing and/or outgoing relation
• Qualified Relations: Create a binary or numeric feature for each combination of an ingoing
and/or outgoing relation and the corresponding entity type.
• Relations to individuals: Create a binary feature for each combination of a relation and
the individual it connects to.

Implementations of these techniques exist in the original FeGeLOD framework for Weka
(Paulheim and Fümkranz 2012), the RapidMiner Linked Open Data Extension (Ristoski
et al. 2015), and the Python kgextension (Bucher et al. 2021). The latter technique, however,
is usually not used due to a rapid explosion of the search space.
Figure 1.1 shows a simple knowledge graph describing three persons and their relations
among each other, as well as their relation to other objects. If we apply the above techniques
for creating propositional representations of the three persons, we would arrive at the rep-

Fig. 1.1 A simple knowledge graph. The dashed line marks the delineation of the schema (upper
part) and the instance level (lower part)
8 1 Introduction

Table 1.2 A simple propositionalization of the knowledge graph shown in Fig. 1.1. We show the
numeric variant of the relations and qualified relations
Technique Entities
Attribute John Mary Julia
Types Person True True True
Literals birthdate 1997-06-08 1996-11-14 NULL
Relations likes 2 1 1
birthdate 1 1 0
bornIn 1 0 1
livesIn 0 1 1
Qualified relations likes.Person 1 0 0
likes.Food 1 1 1
bornIn.City 1 0 1
livesIn.City 0 1 1
Relations to individuals bornIn.{Berlin} 1 0 1
bornIn.{Paris} 0 1 0
livesIn.{Berlin} 0 1 0
livesIn.{Paris} 0 0 1
likes.{Mary} 1 0 0
likes.{Pizza} 1 1 0
likes.{Sushi} 0 0 1

resentation shown in Table 1.2. A corresponding code snippet, using the Python knowledge
graph extension (Bucher et al. 2021), is shown in code Listing 1.1.6
There are a few observations we can make here. First, the graph is not precisely evaluated
under Open World Semantics, which is the semantics that holds for typical knowledge graphs
in RDF (Gandon et al. 2011) and in particular those discussed above, like DBpedia, Wikidata,
and the like. For relation features, for example, the features represent if or how often a relation
exists in the graph, not necessarily how often it exists in the real world. Since knowledge
graphs may be incomplete (Issa et al. 2021), and that incompleteness might not be evenly
distributed, some biases might be introduced here. In the example above, we create a feature
likes.Person with a value of 1 for John and a value of 0 for Mary and Julia, but this does not
necessarily mean that there a no persons that Mary or Julia like, nor that there is not more
than one person that John likes.
Second, the number of features is not limited. The more classes and relation types are
defined in an ontology and used in a graph, the more features are generated. This may easily
lead to very high dimensional feature spaces, which are often suboptimal for downstream

6 Code examples, as well as other additional materials, are available online at http://rdf2vec.org/
book/.
1.2 Feature Extraction from Knowledge Graphs 9

Listing 1.1 Example for propositionalization with the kgextension package


from k g e x t e n s i o n . s p a r q l _ h e l p e r i m p o r t L o c a l E n d p o i n t
i m p o r t p a n d a s as pd

# Load graph
M y G r a p h = L o c a l E n d p o i n t ( f i l e _ p a t h = " ./ i n t r o _ e x a m p l e . ttl " )
M y G r a p h . i n i t i a l i z e ()

# Create data frame


df = pd . D a t a F r a m e ({
’ uri ’: [ ’ http :// r d f 2 v e c . org / book / e x a m p l e 1 # John ’ ,
’ http :// r d f 2 v e c . org / book / e x a m p l e 1 # Mary ’ ,
’ http :// r d f 2 v e c . org / book / e x a m p l e 1 # Julia ’ ]
})

# C r e a t e f e a t u r e s - e x a m p l e 1: d i r e c t t y p e s
from k g e x t e n s i o n . g e n e r a t o r i m p o r t d i r e c t _ t y p e _ g e n e r a t o r
df_types = direct_type_generator (
df , " uri " , e n d p o i n t = M y G r a p h )

# C r e a t e f e a t u r e s - e x a m p l e 2: l i t e r a l s
from k g e x t e n s i o n . g e n e r a t o r i m p o r t d a t a _ p r o p e r t i e s _ g e n e r a t o r
df_data_properties = data_properties_generator (
df , " uri " , e n d p o i n t = M y G r a p h )

# C r e a t e f e a t u r e s - e x a m p l e 3: r e l a t i o n s
from k g e x t e n s i o n . g e n e r a t o r i m p o r t u n q u a l i f i e d _ r e l a t i o n _ g e n e r a t o r
df_relations = unqualified_relation_generator (
df , " uri " , e n d p o i n t = MyGraph , r e s u l t _ t y p e = " count " )

# C r e a t e f e a t u r e s - e x a m p l e 4: q u a l i f i e d r e l a t i o n s
from k g e x t e n s i o n . g e n e r a t o r i m p o r t q u a l i f i e d _ r e l a t i o n _ g e n e r a t o r
df_qrelations = qualified_relation_generator (
df , " uri " , e n d p o i n t = MyGraph , r e s u l t _ t y p e = " count " )

processors like prediction algorithms. For the qualified relations, the number of features often
grows exponentially with the number of instances, and the number of features generated can
exceed the number of instances by a factor of more than 100, as we have shown in Ristoski
and Paulheim (2016). This means that for a dataset with 2,000 instances, the number of
features can already exceed 200,000, making it hard to make sense of that data due to the
curse of dimensionality (Verleysen and François 2005). While some approaches for post hoc
filtering of the generated features have been proposed (Ristoski and Paulheim 2014b), they
usually require generating the full feature space first, which may be a way to circumvent the
curse of dimensionality, but does not really remedy the problem of scalability.
10 1 Introduction

Table 1.3 Computational complexity of different types of features. T: number of types, P = DP +


OP: number of properties, DP: number of datatype properties, OP: number of object properties, I:
number of Individuals
Strategy Complexity
Types O(T)
Literals O(DP)
Relations O(P)
Qualified relations O(OP*T)
Relations to individuals O(OP*I)

While creating relations to individuals is a possible technique of propositionalization, it


can hardly be used in practice, again for the reason of scalability. In this simple example,
we already have seven valid combinations of a predicate and an instance, twice as many
as entities considered. For a real knowledge graph, this number would grow even faster
than that of qualified relations. Therefore, one is often bound to direct types, relations, and
qualified relations. The computational complexity (here: the upper bound of the number of
features that are generated) for each of the strategies is shown in Table 1.3. When looking
back at the numbers in Table 1.1, it becomes obvious that some of the techniques can cause
issues in terms of scalability.
When looking at the representation with those groups of features, one might conclude
that Mary is more similar to Julia than to John, since Mary and Julia have the same value
for five features, while Mary and John only share values of two features. When looking at
the graph, on the other hand, one might come to the conclusion, though, that Mary and John
are more similar, since both like Pizza, and both are related to Berlin.
This example shows that those relations to individuals are very relevant for downstream
tasks. In the example of movie recommendation mentioned above, movies are considered
similar because they share the same director, actor(s), or genre. However, they are hard to
exploit in practice. This observation was one of the key motivating points for developing
RDF2vec.

1.3 Node Classification in RDF

Node classification is the task of assigning a label to a node in a graph (not necessarily an
RDF graph, but in the context of this book, we will look only at RDF graphs). The label may
be the ontological class of the entity, which is often considered in the context of knowledge
graph completion (Paulheim 2017), but it can be any other binary or n-ary label. For example,
in a content-based recommender system, where each recommendable item is represented as
a node in a knowledge graph, the recommendation could be modeled as a node classification
1.3 Node Classification in RDF 11

Table 1.4 Node classification results with different propositionalization techniques


Generator # of features Accuracy
Direct types 12 0.545 ± 0.057
Unqualified relations 19 0.505 ± 0.111
Qualified relations 163 0.530 ± 0.108
Combined 194 0.525 ± 0.081

task (i.e., given a user u, predict whether or not to recommend item i) (Ristoski et al. 2019,
Rosati et al. 2016). A related problem is node regression, where a numerical label (e.g., a
rating for an item) is to be predicted.
As an example, we use an excerpt from DBpedia, which contains 200 bands, 100 each
from the genres rock and soul. The node classification target is to predict the genre.7
Listing 1.2 shows the classification using three of the propositionalization strategies
discussed above, using a standard feed-forward neural network as a downstream classifier.
The results of node classification using those approaches are shown in Table 1.4. We can
observe that none of the approaches works significantly better than guessing, which, for a
balanced binary classification problem like this, would yield an accuracy of 0.5. Obviously,
the propositionalization approaches at hand cannot extract any useful features from the
graph. At the same time, the number of features extract is already considerable and, when
using all three generators altogether, almost as high as the number of instances.
Nevertheless, Fig. 1.2 indicates that there might be some signals which should be useful
for the task at hand (e.g., other genres, record labels, associated artists, etc.). However, those
would require the inclusion of information about entities (i.e., individual genres, record
labels, artists) in the features, which, as discussed above, none of the current propositional-
ization techniques do.
Figure 1.3 shows an example decision tree trained on the example dataset. By analyzing
some of the paths, it becomes obvious why the models trained with the propositionalization
techniques perform that badly. For example, the leftmost leaf node essentially says that bands
for which no genre information is given are classified rock bands, whereas the rightmost
leaf node expresses that bands for which at least two genres, one hometown, and one record
label are given are classified as soul bands. This indicates that the classifier rather picks up
on some statistical artifacts in the knowledge graph (here, rock bands seem to be described
in less detail than soul bands), but does not really provide insights beyond that and is not
capable of expressing the essence of what makes a band a rock or a soul band.

7 The prediction target, i.e., the target artists’ genres, has been removed from the DBpedia excerpt.
Details on the dataset construction can be found in Appendix A.1.
12 1 Introduction

Listing 1.2 Example for node classification using classic propositionalization with the kgextension
package
# Load data
from k g e x t e n s i o n . s p a r q l _ h e l p e r i m p o r t L o c a l E n d p o i n t
M y G r a p h = L o c a l E n d p o i n t ( f i l e _ p a t h = " ./ a r t i s t s _ g r a p h . nt " )
M y G r a p h . i n i t i a l i z e ()

# C r e a t e d a t a frame , split into f e a t u r e s and label


i m p o r t p a n d a s as pd
df = pd . r e a d _ c s v ( ’ ./ b a n d s _ l a b e l s . csv ’ , sep = " \ t " )
dfX = df [[ ’ Band ’ ]]
dfY = df [[ ’ G enre ’ ]]

# C r e a t e f e a t u r e s - use three g e n e r a t o r s
from k g e x t e n s i o n . g e n e r a t o r i m p o r t d i r e c t _ t y p e _ g e n e r a t o r
from k g e x t e n s i o n . g e n e r a t o r i m p o r t u n q u a l i f i e d _ r e l a t i o n _ g e n e r a t o r
from k g e x t e n s i o n . g e n e r a t o r i m p o r t q u a l i f i e d _ r e l a t i o n _ g e n e r a t o r
dfX = d i r e c t _ t y p e _ g e n e r a t o r ( dfX , " Band " , e n d p o i n t = M y G r a p h )
dfX = u n q u a l i f i e d _ r e l a t i o n _ g e n e r a t o r ( dfX ,
" Band " , e n d p o i n t = MyGraph , r e s u l t _ t y p e = " count " )
dfX = q u a l i f i e d _ r e l a t i o n _ g e n e r a t o r ( dfX ,
" Band " , e n d p o i n t = MyGraph , r e s u l t _ t y p e = " count " )

# T r a i n n e u r a l network , e v a l u a t e in 10 - fold cross v a l i d a t i o n


from s k l e a r n . n e u r a l _ n e t w o r k i m p o r t M L P C l a s s i f i e r
from s k l e a r n . m o d e l _ s e l e c t i o n i m p o r t c r o s s _ v a l _ s c o r e
i m p o r t n um py as np
dfX = dfX . iloc [: ,1:]
clf = M L P C l a s s i f i e r ( m a x _ i t e r = 1 0 0 0 )
s c o r e s = c r o s s _ v a l _ s c o r e ( clf , dfX , dfY . v a l u e s . ravel () , cv =10)
s c o r e s . m e a n ()
s c o r e s . std ()

Fig. 1.2 Excerpt of the band node classification dataset. The prediction target labels are shown in
grey rectangular boxes. Note that not all bands in the dataset are labeled examples
1.4 Conclusion 13

Fig. 1.3 Example decision tree trained on the node classification example

1.4 Conclusion

In this chapter, we have seen that classic propositionalization has some shortcomings – the
search space can get large very quickly, while, at the same time, the expressivity of the
features (and, hence, the downstream models) is very limited, in particular since relations
to individuals are not reflected very well (and cannot be easily expressed without avoiding
an exponential explosion of the search space).
The idea of RDF2vec is to address these two shortcomings of classic propositionalization
approaches – to create rich representations which can also reflect relations to entities, while,
at the same time, limiting the dimensionality of the feature space. With RDF2vec, it is
possible to create propositional representations which have a low and controllable number
of features (typically, 200-500 features are used), and, at the same time, capture a large
amount of the information available for the entities in a knowledge graph.
14 1 Introduction

References

Bizer C, Heath T, Berners-Lee T (2011) Linked data: the story so far. In: Semantic services, interop-
erability and web applications: emerging concepts, IGI global, pp 205–227
Bucher TC, Jiang X, Meyer O, Waitz S, Hertling S, Paulheim H (2021) Scikit-learn pipelines meet
knowledge graphs. In: European semantic web conference. Springer, pp 9–14
Carlson A, Betteridge J, Wang RC, Hruschka Jr ER, Mitchell TM (2010) Coupled semi-supervised
learning for information extraction. In: Proceedings of the third ACM international conference
on Web search and data mining. ACM, New York, pp 101–110. https://doi.org/10.1145/1718487.
1718501
Das TK, Kumar PM (2013) Big data analytics: a framework for unstructured data analysis. Int J Eng
Sci Technol 5(1):153
Ehrlinger L, Wöß W (2016) Towards a definition of knowledge graphs. In: SEMANTiCS
Färber M, Bartscherer F, Menne C, Rettinger A (2018) Linked data quality of dbpedia, freebase,
opencyc, wikidata, and yago. Semant Web 9(1):77–129
Gandon F, Krummenacher R, Han SK, Toma I (2011) The resource description framework and its
schema
Heist N, Paulheim H (2019) Uncovering the semantics of wikipedia categories. In: International
semantic web conference. Springer, pp 219–236
Heist N, Paulheim H (2020) Entity extraction from wikipedia list pages. In: Extended semantic web
conference
Heist N, Hertling S, Ringler D, Paulheim H (2020) Knowledge graphs on the web-an overview
Hertling S, Paulheim H (2017) Webisalod: providing hypernymy relations extracted from the web as
linked open data. In: International semantic web conference. Springer, pp 111–119
Hertling S, Paulheim H (2018) Dbkwik: a consolidated knowledge graph from thousands of wikis.
In: 2018 IEEE international conference on big knowledge (ICBK). IEEE, pp 17–24
Hertling S, Paulheim H (2022) Dbkwik++-multi source matching of knowledge graphs. In: Knowl-
edge graphs and semantic web: 4th iberoamerican conference and third Indo-American conference,
KGSWC 2022, Madrid, Spain, November 21–23, 2022, proceedings. Springer, pp 1–15
Hogan A, Blomqvist E, Cochez M, d’Amato C, Melo Gd, Gutierrez C, Kirrane S, Gayo JEL, Navigli
R, Neumaier S et al (2021) Knowledge graphs. ACM Comput Surv (CSUR) 54(4):1–37
Issa S, Adekunle O, Hamdi F, Cherfi SSS, Dumontier M, Zaveri A (2021) Knowledge graph com-
pleteness: a systematic literature review. IEEE Access 9:31322–31339
Ji S, Pan S, Cambria E, Marttinen P, Philip SY (2021) A survey on knowledge graphs: representation,
acquisition, and applications. IEEE Trans Neural Netw Learn Syst 33(2):494–514
Journal of Web Semantics (2014) Jws special issue on knowledge graphs. http://www.
websemanticsjournal.org/2014/09/cfp-special-issue-on-knowledge-graphs.html
Kuhn P, Mischkewitz S, Ring N, Windheuser F (2016) Type inference on wikipedia list pages.
Informatik 2016
Lavrač N, Škrlj B, Robnik-Šikonja M (2020) Propositionalization and embeddings: two sides of the
same coin. Mach Learn 109(7):1465–1507
Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S, Morsey M, van
Kleef P, Auer S, Bizer C (2013) DBpedia – a large-scale, multilingual knowledge base extracted
from wikipedia. Semant Web J 6(2). https://doi.org/10.3233/SW-140134
Lenat DB (1995) CYC: a large-scale investment in knowledge infrastructure. Commun ACM
38(11):33–38. https://doi.org/10.1145/219717.219745
Meusel R, Paulheim H (2015) Heuristics for fixing common errors in deployed schema. org microdata.
In: European semantic web conference. Springer, pp 152–168
References 15

Meusel R, Petrovski P, Bizer C (2014) The webdatacommons microdata, rdfa and microformat dataset
series. In: International semantic web conference. Springer, pp 277–292
Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41
Navigli R, Ponzetto SP (2012) Babelnet: the automatic construction, evaluation and application of a
wide-coverage multilingual semantic network. Artif Intell 193:217–250
Noy N, Gao Y, Jain A, Narayanan A, Patterson A, Taylor J (2019) Industry-scale knowledge graphs:
lessons and challenges: five diverse technology companies show how it’s done. Queue 17(2):48–75
Paulheim H (2015) What the adoption of schema. org tells about linked open data. In: Joint proceedings
of USEWOD and PROFILES
Paulheim H (2017) Knowledge graph refinement: a survey of approaches and evaluation methods.
Semant Web 8(3):489–508
Paulheim H (2018) How much is a triple? Estimating the cost of knowledge graph creation. In: ISWC
2018 posters and demonstrations, industry and blue sky ideas tracks
Paulheim H, Fümkranz J (2012) Unsupervised generation of data mining features from linked open
data. In: Proceedings of the 2nd international conference on web intelligence, mining and semantics,
pp 1–12
Paulheim H, Ponzetto SP (2013) Extending dbpedia with wikipedia list pages. NLP-DBPEDIA@
ISWC 13
Pedro SD, Hruschka ER (2012) Conversing learning: active learning and active social interaction for
human supervision in never-ending learning systems. In: Ibero-American conference on artificial
intelligence. Springer, pp 231–240
Pellissier-Tanon T, Vrandečić D, Schaffert S, Steiner T, Pintscher L (2016) From freebase to wikidata:
the great migration. In: Proceedings of the 25th international conference on world wide web, pp
1419–1428
Pujara J, Miao H, Getoor L, Cohen W (2013) Knowledge graph identification. In: International
semantic web conference. Springer, pp 542–557
Ristoski P, Paulheim H (2014a) A comparison of propositionalization strategies for creating features
from linked open data. In: Linked data for knowledge discovery 6
Ristoski P, Paulheim H (2014b) Feature selection in hierarchical feature spaces. In: International
conference on discovery science. Springer, pp 288–300
Ristoski P, Paulheim H (2016) Rdf2vec: Rdf graph embeddings for data mining. In: International
semantic web conference. Springer, pp 498–514
Ristoski P, Bizer C, Paulheim H (2015) Mining the web of linked data with rapidminer. J Web Semant
35:142–151
Ristoski P, Rosati J, Di Noia T, De Leone R, Paulheim H (2019) Rdf2vec: Rdf graph embeddings
and their applications. Semant Web 10(4):721–752
Rosati J, Ristoski P, Di Noia T, Leone Rd, Paulheim H (2016) Rdf graph embeddings for content-based
recommender systems. CEUR Workshop Proc RWTH 1673:23–30
Schneider EW (1973) Course modularization applied: the interface system and its implications for
sequence control and data analysis
Seitner J, Bizer C, Eckert K, Faralli S, Meusel R, Paulheim H, Ponzetto SP (2016) A large database of
hypernymy relations extracted from the web. In: Proceedings of the tenth international conference
on language resources and evaluation (LREC 2016), pp 360–367
Semantic Web Company (2014) From taxonomies over ontologies to knowledge graphs. https://
semantic-web.com/from-taxonomies-over-ontologies-to-knowledge-graphs/
Speer R, Havasi C (2012) Representing general relational knowledge in conceptnet 5. In: LREC, pp
3679–3686
16 1 Introduction

Suchanek FM, Kasneci G, Weikum G (2007) YAGO: a core of semantic knowledge unifying wordnet
and wikipedia. In: 16th international conference on World Wide Web. ACM, New York, pp 697–706.
https://doi.org/10.1145/1242572.1242667
Tonon A, Felder V, Difallah DE, Cudré-Mauroux P (2016) Voldemortkg: mapping schema. org and
web entities to linked open data. In: International semantic web conference. Springer, pp 220–228
van Bekkum M, de Boer M, van Harmelen F, Meyer-Vitali A, At Teije (2021) Modular design patterns
for hybrid learning and reasoning systems. Appl Intell 51(9):6528–6546
Verleysen M, François D (2005) The curse of dimensionality in data mining and time series prediction.
In: International work-conference on artificial neural networks. Springer, pp 758–770
Vrandečić D, Krötzsch M (2014) Wikidata: a free collaborative knowledge base. Commun ACM
57(10):78–85. https://doi.org/10.1145/2629489
From Word Embeddings to Knowledge Graph
Embeddings
2

Abstract

Word embedding techniques have been developed to assign words to vectors in a vector
space. One of the earliest such methods was word2vec, published in 2013 – and embed-
dings have gathered a tremendous uptake in the natural language processing community
since then. Since RDF2vec is based on word2vec, we take a closer look at word2vec in
this chapter. We explain how word2vec has been developed to represent words as vectors,
and we discuss how this approach can be adapted to knowledge graphs by performing
random graph walks, yielding the basic version of RDF2vec. We explain the CBOW
and SkipGram variants of basic RDF2vec, revisiting the node classification tasks used in
Chap. 1.

2.1 Word Embeddings with word2vec

The previous chapter has already introduced the idea of feature extraction from a knowledge
graph. Feature extraction has also been used in other fields, such as Natural Language
Processing (NLP), e.g., by means of extracting relevant words using POS taggers and/or
keyphrase extraction (Scott and Matwin 1999), or image processing, e.g., by extracting
shapes and color histograms (Kumar and Bhatia 2014, Nixon and Aguado 2019).
In contrast to those approaches, representation learning or feature learning are approaches
that input raw data (e.g., graphs, texts, images) into a machine learning pipeline directly,
and the first steps of the pipeline create a representation which is suitable for the task at
hand (Bengio et al. 2013). There are supervised approaches, which learn a representation
that is suitable for a problem at hand, and unsupervised approaches, which learn a represen-
tation that is usable across different downstream problems. A typical example of supervised
approaches is convolutional neural networks, where each convolution layer creates a more

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 17


H. Paulheim et al., Embedding Knowledge Graphs with RDF2vec, Synthesis Lectures
on Data, Semantics, and Knowledge, https://doi.org/10.1007/978-3-031-30387-6_2
18 2 From Word Embeddings to Knowledge Graph Embeddings

abstract representation of the input data, and the neurons in those layers act as feature detec-
tors (Schmidhuber 2015). Typical unsupervised approaches are autoencoders, which try to
learn a more compact representation of the input data, which can then be used to recon-
struct the original data with as little error as possible (Bengio et al. 2013). Both supervised
and unsupervised methods have been applied to knowledge graphs as well. While RDF2vec,
which is discussed in this book, is an unsupervised method, the most well-known supervised
methods are graph neural networks (Scarselli et al. 2008), which have also been applied to
knowledge graphs under the name of relational graph convolutional networks (Schlichtkrull
et al. 2018).
NLP is probably one of the fields which have changed the most since the advent of
(neural) representation learning. With the increasing popularity of embedding methods,
previous approaches for text representation, like bag of words, are barely used anymore.
Instead, using vector-based representations of words, as well as smaller and larger chunks
of text, has become the dominant representation paradigm in NLP (Khurana et al. 2022).
The general idea underlying word embeddings is that similar words appear in similar
contexts, as phrased by John Rupert Firth in a 1957 article: a word is characterized by the
company it keeps (Firth 1957). For example, consider the two sentences:

Jobs, Wozniak, and Wayne founded Apple Computer Company in April 1976.

and

Google was officially founded as a company in January 2006.

Here, both Apple and Google appear with similar context words (e.g., company and
founded), hence, it can be assumed that they are somewhat similar.1
Hence, when representing words, creating similar representations for similar contexts
should lead to similar words being represented by similar vectors. This idea was taken up
by Bengio et al. (2000) for training neural networks to predict a word from its context. By
using shared weight matrices for each word, they could come up with embedding vectors
for words. This approach led to one of the most famous word embedding approaches, called
word2vec (Mikolov et al. 2013a).
word2vec comes in two variants: context bag of words (CBOW) is closely aligned to the
neural language model by Bengio et al. (2000) and tries to predict a word from its context,
while skip-gram (SG) is organized reversely and tries to predict the context of a word from
the word itself. In both cases, a projection layer is used to learn a representation of a word,
as shown in Fig. 2.1.

1 Of course, the example is a bit simplistic, and one would in fact not be able to conclude the similarity
of the two terms Google and Apple from just those two sentences. However, when using a large corpus
of sentences containing both words, one will be able to observe a similar distribution of words in the
two terms’ contexts.
2.1 Word Embeddings with word2vec 19

(a) CBOW (b) SG

Fig. 2.1 The two basic architectures of word2vec (Ristoski and Paulheim 2016)

The CBOW model predicts target words from context words within a given window. The
model architecture is shown in Fig. 2.1a. The input layer is comprised of all the surrounding
words for which the input vectors are retrieved from the input weight matrix, averaged, and
projected in the projection layer. Then, using the weights from the output weight matrix,
a score for each word in the vocabulary is computed, which is the probability of the word
being a target word. Formally, given a sequence of training words w1 , w2 , w3 , . . . , wT , and
a context window c, the objective of the CBOW model is to maximize the average log
probability:
1 
T
log p(wt |wt−c . . . wt+c ), (2.1)
T
t=1
where the probability p(wt |wt−c . . . wt+c ) is calculated using the softmax function:

ex p(v̄ T vw t )
p(wt |wt−c . . . wt+c ) = V , (2.2)
T 
w=1 ex p(v̄ vw )

where vw is the output vector of the word w, V is the complete vocabulary of words, and v̄
is the averaged input vector of all the context words:
1 
v̄ = vwt+ j (2.3)
2c
−c≤ j≤c, j=0

The skip-gram model does the inverse of the CBOW model and tries to predict the context
words from the target words (Fig. 2.1b). More formally, given a sequence of training words
w1 , w2 , w3 , . . . , wT , and a context window c, the objective of the skip-gram model is to
maximize the following average log probability:

1  
T
log p(wt+ j |wt ), (2.4)
T
t=1 −c≤ j≤c, j=0

where the probability p(wt+ j |wt ) is calculated using the softmax function:
20 2 From Word Embeddings to Knowledge Graph Embeddings

T v )
ex p(vwo wi
p(wo |wi ) = V , (2.5)
T
w=1 ex p(v w vwi )

where vw and vw are the input and the output vector of the word w, and V is the complete
vocabulary of words.
In both cases, calculating the softmax function is computationally inefficient, as the cost
for computing is proportional to the size of the vocabulary. Therefore, two optimization
techniques have been proposed, i.e., hierarchical softmax and negative sampling (Mikolov
et al. 2013b). Empirical studies have shown that in most cases negative sampling leads to
a better performance than hierarchical softmax, which depends on the selected negative
samples, but it has higher runtime.
Recent advances of word embeddings try to capture meaning also in a more fine-grained
and contextual way (e.g., distinguishing the Apple company from the fruit named apple in the
example above), as done, e.g., in BERT (Devlin et al. 2018), or creating cross-lingual word
embeddings (Ruder et al. 2019). Besides their very good performance on many tasks, Word
embeddings have also become rather popular because, since they belong to the unsupervised
category, they can be trained and published for further use. Today, there is an abundance
of pre-trained word embeddings for many languages, domains, and text genres available
online. This makes it easy to develop applications using such word embeddings without
requiring extensive computational resources to actually compute the embeddings.

2.2 Representing Graphs as Sequences

Techniques for word embeddings work on texts, i.e., sequences of words. In order to apply
them to knowledge graphs, those need to be represented as sequences first.
In Chap. 1, we have used a very simple definition of knowledge graphs, i.e., representing
a knowledge graph as a set of triples. By considering entities E and relations R as “words”,
the set of triples can also be thought of as a set of three-word “sentences”. Hence, applying
word embedding methods on those sets would already provide us with a basic embedding
vector for each entity. This approach is taken, e.g., by Wembedder, an embedding service
for Wikidata entities (Nielsen 2017).
While this approach is very straightforward, it will most often not perform very well.
The reason is that the amount of context information of an entity captured by just looking at
single triples is very limited. Looking back at the example in Fig. 1.2, the only information
in the direct neighborhood of the entity Rustic Overtones are the two associated bands.
However, the actually interesting information to classify the entity at hand would rather be
the genre and record label of those associated bands. By only considering single triples,
however, that information is not encoded in any of the “sentences”, and, hence, not picked
up by the word embedding method applied to them.
2.2 Representing Graphs as Sequences 21

To overcome this issue, we enhance the context of an entity by not only looking at single
triples but longer sequences S instead. Formally, the set of all two-hop sequences S2hop can
be extracted from the set of all triples E ⊆ VxRxV as follows2 :

S2hop := {(v1 , r1 , v2 , r2 , v3 ) : (v1 , r1 , v2 ) ∈ E ∧ (v2 , r2 , v3 ) ∈ E} (2.6)

Longer sequences can be defined accordingly.3 Generally, a walk of length n (for an even
number n) can be defined as a sequence of entities and relations as follows:
 
walkn := w− n2 , w− n2 +1 , . . . , w−1 , w0 , w1 , . . . , w n2 −1 , w n2 (2.7)

where 
V if i is even
wi ∈ (2.8)
R if i is odd
It is worth noticing that, given that the set of triples is a mathematical set and hence
free from duplicates, E can be fully reconstructed from S2hop (and also all walk sets longer
than two hops), i.e., S2hop is a representation of the knowledge graph that contains all its
information.
A full enumeration of sequences, even with a larger number of hops, is possible for
smaller knowledge graphs. The number of sequences is n · d h , where n is the number of
entities, d is the average node degree, and h is the number of hops.
For larger and more densely connected graphs, d h can quickly become large. For example,
using 4-hop sequences for DBpedia, which has a linkage degree of 21.3, as shown in Chap. 1,
one would extract more than 200,000 sequences per entity, i.e., more than a trillion sequences
in total. In comparison, typical word2vec models are trained on Wikipedia, which has roughly
160 million sentences, i.e., a couple of orders of magnitude less.4 Thus, using all sequences
does not scale.
Therefore, it is a common practice to sample S by using random walks. Instead of enu-
merating all walks, a fixed number of random walks is started from each node. Thus, the
number of extracted sequences grows only linearly in the number of entities in the graph,
independently of the length of sequences extracted and the degree of the knowledge graph.
Depending on the implementation, the result is either a set or a multiset of sequences. In
the first case, the set of extracted walks W is a subset of S. In the second case, walks can also

2 You may note that literals are not considered here. As many other knowledge graph embedding
approaches, RDF2vec only uses relations between entities and does not utilize literal values.
3 It is important to point out that not all implementations of RDF2vec share the same terminology.
The two-hop sequence above would be referred to as a “walk of length 2” (i.e., counting only nodes)
by some implementations, while others would consider it a “walk of length 4” (i.e., counting nodes
and edges). In this book, we follow the latter terminology.
4 According to https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia, the English Wikipedia
has roughly 4 billion words. Assuming 25 words per sentence, as measured by Jatowt and Tanaka
(2012), this would correspond to 160 million sentences.
22 2 From Word Embeddings to Knowledge Graph Embeddings

Table 2.1 Subset of the walks extracted for the example in Fig. 1.1. The table only shows the walks
starting in the entities John, Mary, and Julia
v1 r1 v2 r2 v3
John isA Person – –
John bornIn Berlin isA City
John likes Pizza isA Food
John likes Mary isA Person
John likes Mary livesIn Berlin
Mary isA Person – –
Mary livesIn Berlin isA City
Mary likes Pizza isA Food
Julia isA Person – –
Julia bornIn Paris isA City
Julia livesIn Paris isA City
Julia likes Sushi isA Food

be duplicated – which is particularly the case for low-degree nodes. Hence, multiset-based
approaches can introduce a representation bias, increasing the influence of such low-degree
nodes. On the other hand, they may also capture the distributions in the graph better.
Table 2.1 shows a set of walks extracted for the example in Fig. 1.1 in Chap 1. It depicts
the walks started in the three nodes John, Mary, and Julia (while in general, walks would
be generated for all entities in the graph).
When we considered this example with propositionalization methods, we saw that, coun-
tering our intuition, Mary was more similar to Julia than to John (cf. Sect. 1.2). On the other
hand, considering the walk representation, out of the three walks created from the entity
Mary, two are completely identical to the ones created for John (except for the first entity,
of course), and the third one, only one element is different. In contrast, only one walk for
Mary is identical to a walk for Julia, and the other two differ in at least one element. We can
hence assume that a model built on those walks should be able to capture those fine-grained
similarities a lot better.5

5 For the sake of clarity, it should be stated that this argument is a bit simplified since RDF2vec
uses all walks which contain an entity to build that entity’s representation, while here, we have only
looked at the walks where the entity appears in the v1 position. However, the example still shows that
the context of an entity is better captured by the walks than by the propositionalization techniques
discussed in Chap. 1.
2.3 Learning Representations from Graph Walks 23

2.3 Learning Representations from Graph Walks

As discussed above, walks or sequences extracted from knowledge graphs can be considered
as yet another way to represent those graphs. If we start enough walks from each entity, we
will end up with a set of sequences where each entity and each relation in the graph appears
at least once.
Language modeling techniques such as word2vec use such sequences of tokens (i.e.,
words for natural language, entities and relations for sequences extracted from knowledge
graphs) to learn a representation for each token. This means that we can use them to learn
representations for entities and relations in knowledge graphs.
Internally, word2vec slides a window of a fixed size over the walks as they are shown in
Table 2.1. Each element in each row becomes the focus word after one another (marked as
w(t) in Fig. 2.1, with the other ones within the window length becoming the context words
(marked w(t − 2), w(t − 1), etc. in Fig. 2.1. In most RDF2vec implementations, the standard
value of 5 of the gensim implementation is used for the window size, i.e., for each entity, an
inbound and an outbound triple are used to learn the representations, but further away entities
are not considered. This value, however, is configurable, and for graphs using more complex
representational patterns, a larger window size could yield better vector representations (at
the cost of a higher computational cost).
Ultimately, we expect that entities appearing in similar contexts in the knowledge graph
will make them appear also in similar walks (as shown in the example above), which will
lead to similar embedding vectors for those entities.6
Figure 2.2 shows the complete big picture of RDF2vec: first, sequences of entities and
relations are extracted from the graph, then, those sequences are fed into the word2vec
algorithm to create embedding vectors for all entities in the knowledge graph.
There are a few approaches that use the same idea (i.e., extracting walks from graphs,
and learning a language model on those walks). Those approaches are often identical or at
least very similar to RDF2vec. Examples of such approaches include:

• Walking RDF and OWL (Alshahrani et al. 2017) pursues exactly the same idea as
RDF2vec, and the two can be considered identical. It uses random walks and Skip Gram
embeddings. The approach has been developed at the same time as RDF2vec.
• KG2vec (Wang et al. 2021b) pursues a similar idea as RDF2vec by first transforming
the directed, labeled RDF graph into an undirected, unlabeled graph (using nodes for
the relations) and then extracting walks from that transformed graph. Although no direct
comparison is available, we assume that the embeddings are comparable.

6 We will revisit – and, to a certain extent, weaken – that assumption when we discuss different
variants of RDF2vec, in particular order-aware RDF2vec, in Chap. 4.
24 2 From Word Embeddings to Knowledge Graph Embeddings

Fig. 2.2 Overall workflow of RDF2vec

• Wembedder (Nielsen 2017) is a simplified version of RDF2vec which uses the raw triples
of a knowledge graph as input to the word2vec implementation, instead of random walks.
It serves pre-computed vectors for Wikidata.
• KG2vec (Soru et al. 2018) (not to be confused with the aforementioned approach also
named KG2vec) follows the same idea of using triples as input to a Skip-Gram algorithm.
• Triple2Vec (Fionda and Pirró 2019) follows a similar idea of walk-based embedding
generation but embeds entire triples instead of nodes.

For the sake of completeness, one should also mention node2vec (Grover and Leskovec
2016) and DeepWalk (Perozzi et al. 2014), which pursue a similar approach, but are designed
graphs without edge labels, i.e., graphs with only one type of edge. Therefore, they can be
(and have been) applied to knowledge graphs, but do not leverage all the information, since
they treat all edges the same.

2.4 Software Libraries

There are two main libraries that can be used for RDF2vec (in addition to a few more
implementations, which often support different features, but are less well-maintained and/or
documented):

• pyRDF2vec7 (Vandewiele et al. 2022) is a Python-based implementation. It supports many


flavors of RDF2vec and comes with an extensible library of walk generation, sampling,
and embedding strategies. pyRDF2vec is used for most examples in this book.
• jRDF2vec8 (Portisch et al. 2020b) is a Java-based implementation, which makes all
functionality available through a command line interface. It can be integrated in software
engineering projects as a maven dependency and is also available as a Docker image.

7 https://github.com/IBCNServices/pyRDF2Vec.
8 https://github.com/dwslab/jRDF2Vec.
2.5 Node Classification with RDF2vec 25

Both pyRDF2vec and jRDF2vec use the gensim library9 (Řehůřek and Sojka 2010) to
compute the actual word2vec embeddings. Both libraries differ slightly with respect to the
feature set they support. Details are provided in Chap. 4.

2.5 Node Classification with RDF2vec

We will now revisit the node classification example from Sect. 1.3. As a brief recap, the
knowledge graph contains a subset of DBpedia consisting of 100 bands from the rock and
soul genre each, and the task is to predict that genre.
Listing 2.1 shows the code used to classify nodes with RDF2vec.10 With that approach,
we reach an accuracy of 0.700± 0.071, which is significantly better than the approaches
using simple propositionalization (the best was 0.545± 0.057). Furthermore, the number of
features can be directly controlled and does not depend on the graph topology and complexity.
To understand why this approach works better, we use a 2D plot of the resulting embedding
space. For that purpose, we reduce the number of dimensions by computing a Principal
Component Analysis (Abdi and Williams 2010). The code for transforming the resulting
embeddings of Listing 2.1 into a 2D PCA plot is shown in Listing 2.2.
Figure 2.3 shows the PCA plot of both the RDF2vec embeddings, as well as the propo-
sitionalization created in Sect. 1.3. We can observe that the RDF2vec embeddings have

(a) Propositionalization approach from (b) RDF2vec


chapter 1

Fig. 2.3 The band dataset from Chap. 1 represented using propositionalization and RDF2vec, shown
in 2-D PCA projections

9 https://radimrehurek.com/gensim/index.html.
10 We use the pyRDF2vec implementation by Vandewiele et al. (2022) for the code examples through-
out this book. For a full list of implementations of RDF2vec, see http://www.rdf2vec.org.
26 2 From Word Embeddings to Knowledge Graph Embeddings

Listing 2.1 Node classification example with RDF2vec


# Load k n o w l e d g e graph
from p y r d f 2 v e c . g r a p h s i m p o r t KG
path = " f i l e p a t h / "
kg = KG ( ’ ./ a r t i s t s _ g r a p h . nt ’ )

# Load ground truth


i m p o r t p a n d a s as pd
df = pd . r e a d _ c s v ( path + ’ b a n d s _ l a b e l s . csv ’ , sep = " \ t " )
dfX = df [[ ’ Band ’ ]]
dfY = df [[ ’ Genre ’ ]]

# I d e n t i f y e n t i t i e s to c r e a t e v e c t o r s for
e n t i t i e s = list ( dict . f r o m k e y s ( df [ ’ Band ’ ]. t o _ l i s t ()))
k g e n t i t i e s = kg . _ e n t i t i e s

# Define walk strategy


from p y r d f 2 v e c . w a l k e r s i m p o r t R a n d o m W a l k e r
r a n d o m _ w a l k e r = R a n d o m W a l k e r (4 , 500)
w a l k e r s = []
for i in range (1):
walkers . append ( random_walker )

# Learn RDF2vec model


from p y r d f 2 v e c i m p o r t R D F 2 V e c T r a n s f o r m e r
from p y r d f 2 v e c . e m b e d d e r s i m p o r t W o r d 2 V e c
t r a n s f o r m e r = R D F 2 V e c T r a n s f o r m e r ( w a l k e r s = walkers ,
e m b e d d e r = W o r d 2 V e c ( sg =1 , v e c t o r _ s i z e =50 ,
hs =1 , w i n d o w =5 , m i n _ c o u n t =0))
em be dd in gs , _ = t r a n s f o r m e r . f i t _ t r a n s f o r m ( kg , e n t i t i e s )

# e v a l u a t e in 10 - fold CV
from s k l e a r n . n e u r a l _ n e t w o r k i m p o r t M L P C l a s s i f i e r
from s k l e a r n . m o d e l _ s e l e c t i o n i m p o r t c r o s s _ v a l _ s c o r e
i m p o r t n um py as np
dfX = pd . D a t a F r a m e ( list ( map ( np . ravel , e m b e d d i n g s )))
clf = M L P C l a s s i f i e r ( m a x _ i t e r = 1 0 0 0 0 )
s c o r e s = c r o s s _ v a l _ s c o r e ( clf , dfX , dfY . v a l u e s . ravel () , cv =10)
s c o r e s . m e a n ()
s c o r e s . std ()

a stronger tendency to create visible clusters: for example, the upper left part of the dia-
gram contains mostly rock bands, while the lower part mostly contains soul bands. Such
clusters can be exploited by downstream classifiers. On the other hand, the plot of the
propositionalization approach shows no such class separation, indicating that it is harder
for the downstream classifier to predict the correct class. This is also reflected in the better
classification results for RDF2vec in this case.
2.6 Conclusion 27

Listing 2.2 Visualizing node embeddings in a 2D PCA


# V i s u a l i z e in 2 D PCA
i m p o r t m a t p l o t l i b . p y p l o t as plt
from s k l e a r n . d e c o m p o s i t i o n i m p o r t PCA

# C o m p u t e PCA
pca = PCA ( n _ c o m p o n e n t s =2)
p c a _ r e s u l t = pca . f i t _ t r a n s f o r m ( dfX )
p r i n c i p a l D f = pd . D a t a F r a m e ( data = pca_result ,
columns = [’ principal component 1’,
’ p r i n c i p a l c o m p o n e n t 2 ’ ])
f i n a l D f = pd . c o n c a t ([ p r i n c i p a l D f , dfY ] , axis = 1)

# Prepare diagram
fig = plt . f i g u r e ( f i g s i z e = (8 ,8))
ax = fig . a d d _ s u b p l o t (1 ,1 ,1)
ax . s e t _ x l a b e l ( ’ P r i n c i p a l C o m p o n e n t 1 ’ , f o n t s i z e = 15)
ax . s e t _ y l a b e l ( ’ P r i n c i p a l C o m p o n e n t 2 ’ , f o n t s i z e = 15)

# Create color codes


t a r g e t s = [ ’ Soul ’ , ’ Rock ’ ]
colors = [ ’r ’, ’b ’]

# Create plot
for target , color in zip ( targets , c o l o r s ):
i n d i c e s T o K e e p = f i n a l D f [ ’ G e n r e ’ ] == t a r g e t
ax . s c a t t e r (
f i n a l D f . loc [ i n d i c e s T o K e e p , ’ p r i n c i p a l c o m p o n e n t 1 ’] ,
f i n a l D f . loc [ i n d i c e s T o K e e p , ’ p r i n c i p a l c o m p o n e n t 2 ’] ,
c = color )
ax . l e g e n d ( t a r g e t s )
ax . grid ()

2.6 Conclusion

In this chapter, we have seen a first glance at RDF2vec embeddings and grasped an under-
standing of how they are computed. We have observed that they create vector representations
of nodes that can be used for downstream classification tasks because the resulting distri-
butions separate classes better than classic propositionalization tasks. At the same time, we
are able to limit the dimensionality of the resulting feature space.
While this chapter covered only the basic variant RDF2vec, quite a few variants have
been proposed for RDF2vec, which affect both the generation of the walks as well as the
computation of the word embedding model. In the subsequent chapters, we will investigate
a few of those variants, and discuss their impact on the resulting embeddings.
28 2 From Word Embeddings to Knowledge Graph Embeddings

References

Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdiscip Rev Comput Stat
2(4):433–459
Alshahrani M, Khan MA, Maddouri O, Kinjo AR, Queralt-Rosinach N, Hoehndorf R (2017) Neuro-
symbolic representation learning on biological knowledge graphs. Bioinformatics 33(17):2723–
2730
Bengio Y, Ducharme R, Vincent P (2000) A neural probabilistic language model. In: Advances in
neural information processing systems 13
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives.
IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers
for language understanding. arXiv:1810.04805
Fionda V, Pirró G (2019) Triple2vec: learning triple embeddings from knowledge graphs.
arXiv:1905.11691
Firth JR (1957) A synopsis of linguistic theory, 1930-1955. In: Studies in linguistic analysis
Grover A, Leskovec J (2016) node2vec: scalable feature learning for networks. In: Proceedings of
the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp
855–864
Jatowt A, Tanaka K (2012) Is wikipedia too difficult? comparative analysis of readability of wikipedia,
simple wikipedia and britannica. In: Proceedings of the 21st ACM international conference on
Information and knowledge management, pp 2607–2610
Khurana D, Koli A, Khatter K, Singh S (2022) Natural language processing: state of the art, current
trends and challenges. In: Multimedia tools and applications, pp 1–32
Kumar G, Bhatia PK (2014) A detailed review of feature extraction in image processing systems.
In: 2014 fourth international conference on advanced computing & communication technologies.
IEEE, pp 5–12
Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector
space. arXiv:1301.3781
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words
and phrases and their compositionality. In: Advances in neural information processing systems, pp
3111–3119
Nielsen FÅ (2017) Wembedder: wikidata entity embedding web service. arXiv:1710.04099
Nixon M, Aguado A (2019) Feature extraction and image processing for computer vision. Academic
press
Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In:
Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and
data mining, pp 701–710
Portisch J, Hladik M, Paulheim H (2020b) Rdf2vec light–a lightweight approach for knowledge graph
embeddings. In: International semantic web conference, posters and demonstrations
Řehůřek R, Sojka P (2010) Software framework for topic modelling with large corpora. In: Proceed-
ings of the LREC 2010 workshop on new challenges for NLP frameworks, ELRA, Valletta, Malta,
pp 45–50. http://is.muni.cz/publication/884893/en
Ristoski P, Paulheim H (2016) Rdf2vec: Rdf graph embeddings for data mining. In: International
semantic web conference. Springer, pp 498–514
Ruder S, Vulić I, Søgaard A (2019) A survey of cross-lingual word embedding models. J Artif Intell
Res 65:569–631
Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2008) The graph neural network
model. IEEE Trans Neural Netw 20(1):61–80
2.6 Conclusion 29

Schlichtkrull M, Kipf TN, Bloem P, Berg Rvd, Titov I, Welling M (2018) Modeling relational data
with graph convolutional networks. In: European semantic web conference. Springer, pp 593–607
Schmidhuber J (2015) Deep learning. Scholarpedia 10(11):32832
Scott S, Matwin S (1999) Feature engineering for text classification. ICML, Citeseer 99:379–388
Soru T, Ruberto S, Moussallem D, Valdestilhas A, Bigerl A, Marx E, Esteves D (2018) Expeditious
generation of knowledge graph embeddings. arXiv:1803.07828
Vandewiele G, Steenwinckel B, Agozzino T, Ongenae F (2022) pyrdf2vec: a python implementation
and extension of rdf2vec. 10.48550/ARXIV.2205.02283, https://arxiv.org/abs/2205.02283
Wang Y, Dong L, Jiang X, Ma X, Li Y, Zhang H (2021b) Kg2vec: a node2vec-based vectoriza-
tion model for knowledge graph. Plos one 16(3):e0248552. https://doi.org/10.1371/journal.pone.
0248552
Benchmarking Knowledge Graph Embeddings
3

Abstract

RDF2vec (and other techniques) provide embedding vectors for knowledge graphs. While
we have used a simple node classification task so far, this chapter introduces a few
datasets and three common benchmarks for embedding methods—i.e., SW4ML, GEval,
and DLCC—and shows how to use them for comparing different variants of RDF2vec.
The novel DLCC benchmark allows us to take a closer look at what RDF2vec vectors
actually represent, and to analyze what proximity in the vector space means for them.

3.1 Node Classification with Internal Labels—SW4ML

In the previous chapter, we have motivated the use of knowledge graph embeddings for
learning predictive models on entities in knowledge graphs. In the running example we
introduced, the task was to predict the genre of bands represented in a knowledge graph—
i.e., the task is node classification.
In the running example in the previous chapters, we used one particular relation in the
knowledge graph—the genre relation—and removed it for prediction. This is a common
way of creating benchmarks for node classification, and there are a few datasets that are
used for benchmarking embeddings which use this approach.
The SW4ML benchmark,1 introduced in Ristoski et al. (2016), uses four different existing
knowledge graphs and holds out a discrete label for certain nodes:

• The AIFB dataset describes the AIFB research institute in terms of its staff, research
group, and publications. In Bloehdorn and Sure (2007) the dataset was first used to predict

1 http://w3id.org/sw4ml-datasets.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 31


H. Paulheim et al., Embedding Knowledge Graphs with RDF2vec, Synthesis Lectures
on Data, Semantics, and Knowledge, https://doi.org/10.1007/978-3-031-30387-6_3
32 3 Benchmarking Knowledge Graph Embeddings

Table 3.1 Characteristics of the SW4ML datasets with internal labels. The numbers on the left-
hand side show the statistics of the knowledge graph at hand, i.e., the number and average degree of
instances, number of classes, and object properties used in the graph (since RDF2vec only considers
relations between resources, but no literals, we only included those in the computation), the numbers
on the right-hand side depict the statistics of the classification problem, i.e., number of labeled
instances, and number of distinct class labels
Dataset # Instances Avg. Degree # Classes # Relations # Instances # Labels
AIFB 2,548 12.5 57 72 176 4
AM 367,236 8.6 12 44 1,000 11
MUTAG 22,372 3.6 142 4 340 2
BGS 101,458 5.4 6 41 146 2

the affiliation (i.e., research group) for people in the dataset. The dataset contains 178
members of a research group, however, the smallest group contains only 4 people, which
was removed from the dataset, leaving five classes. Furthermore, the employs relation,
which is the inverse of the prediction target relation affiliation, has been removed.
• The AM dataset contains information about artifacts in the Amsterdam Museum (de Boer
et al. 2012). Each artifact in the dataset is linked to other artifacts and details about its
production, material, and content. It also has an artifact category (e.g., painting, sculpture,
etc.), which serves as a prediction target. For SW4ML, a stratified random sample of 1,000
instances was drawn from the complete dataset. Moreover, the material relation has
been removed, since it highly correlates with the artifact category.
• The MUTAG dataset is distributed as an example dataset for the DL-Learner toolkit.2 It
contains information about complex molecules that are potentially carcinogenic, which
is given by the isMutagenic property.
• The BGS dataset was created by the British Geological Survey and describes geological
measurements in Great Britain.3 It was used in de Vries (2013) to predict the lithogen-
esis property of named rock units. The dataset contains 146 named rock units with a
lithogenesis, from which we the largest two classes are used.

Table 3.1 depicts the characteristics of the four datasets.


In Ristoski et al. (2019), we have run experiments on the SW4ML dataset, including
different variants of RDF2vec. The experiments also included basic propositionalization
techniques introduced in Chap. 1. The results are depicted in Table 3.2.
We can make a few observations from this table. First of all, RDF2vec is always able to
outperform the baselines, and the relations to individuals approach is not scalable enough
for the MUTAG dataset, as discussed above.

2 http://dl-learner.org/.
3 http://data.bgs.ac.uk/.
3.2 Machine Learning with External Labels—GEval 33

Table 3.2 Results (accuracy) of different propositionalization and embedding variants on the
SW4ML datasets, taken from Ristoski et al. (2019). The table only depicts the results achieved with
a linear SVM, which was the best-scoring approach according to the original publication. Results for
RDF2vec are reported for CBOW and Skip Gram (SG), and 200 and 500 dimensions, respectively.
Results marked with—did not finish within a time limit, or ran out of RAM
Approach/Dataset AIFB AM MUTAG BGS
Relations 0.501 0.644 – 0.720
Relations to individuals 0.886 0.868 – 0.858
RDF2vec CBOW 200 0.795 0.818 0.803 0.747
RDF2vec SG 200 0.874 0.872 0.779 0.753
RDF2vec CBOW 500 0.874 0.872 0.779 0.753
RDF2vec SG 500 0.896 0.782 0.781 0.882

When looking at the results for the different variants of RDF2vec, the results are less
conclusive. In general, the results using 500 dimensions are often superior to those with
200 dimensions. Moreover, the skip-gram result often outperforms the results achieved with
CBOW. Nevertheless, there are also cases where these observations do not hold (see, e.g.,
the good performance of CBOW with 200 dimensions on MUTAG).

3.2 Machine Learning with External Labels—GEval

The original version of the SW4ML benchmark also contained further test cases. Those
are based on the existing knowledge graph DBpedia, and use various external variables
as prediction targets. GEval supports a wider range of data mining tasks (i.e., not only
classification). In general, to evaluate an embedding method with GEval, one needs to first
compute embedding vectors for DBpedia. With those vectors, GEval performs different runs
with predictions and also performs parameter tuning on the respective prediction operators,
as shown in Fig. 3.1. Due to the systematic nature of the benchmark, results for different
embedding methods can be directly compared.
In total, GEval comprises six different tasks, many of which have different test cases,
making 20 test cases in total (Table 3.3):

• Five classification tasks, evaluated by accuracy. Those tasks use the same ground truth as
the regression tasks (see below), where the numeric prediction target is discretized into
high/medium/low (for the Cities, AAUP, and Forbes dataset) or high/low (for the Albums
and Movies datasets). All five tasks are single-label classification tasks.
• Five regression tasks, evaluated by root mean squared error (RMSE). Those datasets are
constructed by acquiring an external target variable for instances in knowledge graphs
34 3 Benchmarking Knowledge Graph Embeddings

Fig. 3.1 Schematic depiction of the GEval framework

which is not contained in the knowledge graph per se. Specifically, the ground truth
variables for the datasets are: a quality of living indicator for the Cities dataset, obtained
from Mercer; average salary of university professors per university, obtained from the
AAUP; profitability of companies, obtained from Forbes; average ratings of albums and
movies, obtained from Facebook.
• Four clustering tasks (with ground truth clusters), evaluated by accuracy. The clusters
are obtained by retrieving entities of different ontology classes from the knowledge
graph. The clustering problems range from distinguishing coarser clusters (e.g., cities vs.
countries) to finer ones (e.g., basketball teams vs. football teams).
• A document similarity task (where the similarity is assessed by computing the similarity
between entities identified in the documents), evaluated by the harmonic mean of Pearson
and Spearman correlation coefficients. The dataset is based on the LP50 dataset (Lee
et al. 2005). It consists of 50 documents, each of which has been annotated with DBpedia
entities using DBpedia spotlight (Mendes et al. 2011). The task is to predict the similarity
of each pair of documents. In the GEval framework, this similarity is computed from the
pairwise similarities of the entities in the documents.
• An entity relatedness task (where semantic similarity is used as a proxy for semantic relat-
edness), evaluated by Kendall’s Tau. The dataset is based on the KORE dataset (Hoffart
3.2 Machine Learning with External Labels—GEval 35

Table 3.3 Overview of the evaluation datasets in GEval


Task Dataset # entities Target variable
Classification Cities 212 3 classes (67/106/39)
AAUP 960 3 classes (236/527/197)
Forbes 1,585 3 classes (738/781/66)
Albums 1,600 2 classes (800/800)
Movies 2,000 2 classes (1,000/1,000)
Regression Cities 212 Numeric [23, 106]
AAUP 960 Numeric [277, 1009]
Forbes 1,585 Numeric [0.0, 416.6]
Albums 1,600 Numeric [15, 97]
Movies 2,000 Numeric [1, 100]
Clustering Cities and countries (2k) 4,344 2 clusters (2,000/2,344)
Cities and countries 11,182 2 clusters (8,838/2,344)
Cities, countries, albums, 6,357 5 clusters
movies, AAUP, forbes (2,000/960/1,600/212/1,585)
Teams 4,206 2 clusters (4,185/21)
Document similarity Pairs of 50 documents with 1,225 Numeric similarity
entities score [1.0, 5.0]
Entity relatedness 20×20 entity pairs 400 Ranking of entities
Semantic analogies (All) capitals and countries 4,523 Entity prediction
Capitals and countries 505 Entity prediction
Cities and states 2,467 Entity prediction
Countries and currencies 866 Entity prediction

et al. 2012). The dataset consists of 20 seed entities from the YAGO knowledge graph,
and 20 related entities each. Those 20 related entities per seed entity have been ranked by
humans to capture the strength of relatedness. The task is to rank the entities per seed by
relatedness. In the GEval framework, this ranking is computed based on cosine similarity
in the embedding vector space.
• Four semantic analogy tasks (e.g., Athens is to Greece as Oslo is to X), which are based
on the original datasets on which word2vec was evaluated (Mikolov et al. 2013). The
datasets were created by manual annotation. The goal of the evaluation is to predict the
fourth element (D) in an analogy A : B = C : D by considering the closest n vectors to
B − A + C. If the element is contained the top n predictions, the answer is considered
to be correct, i.e., the evaluation metric is top-n accuracy. In the default setting of the
evaluation framework used, n is set to 2.

Table 3.4 shows the results of the two basic variants of RDF2vec on GEval. It shows that
the already observed trend of skip-gram being superior to CBOW also holds in this case.
36 3 Benchmarking Knowledge Graph Embeddings

Table 3.4 Results on GEval for RDF2vec


Task Dataset CBOW SG
Classification Cities 0.725 0.818
Movies 0.549 0.726
Albums 0.536 0.586
AAUP 0.643 0.706
Forbes 0.575 0.623
Regression Cities 18.963 15.375
Movies 24.238 20.215
Albums 15.812 15.288
AAUP 77.250 65.985
Forbes 39.204 36.545
Clustering Cities and countries (2k) 0.520 0.789
Cities and countries 0.783 0.587
Cities, albums, movies, 0.547 0.829
AAUP, forbes
Teams 0.940 0.909
Document similarity LP50 0.283 0.237
Entity relatedness KORE 0.611 0.747
Semantic analogies Capitals-countries (all) 0.594 0.905
Capitals-countries 0.810 0.957
Cities-states 0.507 0.609
Countries-currencies 0.338 0.574

While GEval is a useful tool to conduct evaluations on a variety of tasks and to get an idea
of how well different flavors of embeddings work for different downstream applications,
comparing results across papers using GEval is not that easy. Many papers use GEval or
subsets of GEval to report on RDF2vec variants and extensions, and we will also show
different variants of RDF2vec and discuss the results achieved on those benchmarks in the
next chapter. However, as those results are taken from different papers, they are not always
fully comparable, as they may use different versions and/or subsets of DBpedia.

3.3 Benchmarking Expressivity of Embeddings—DLCC

When building an application using knowledge graph embeddings, and the task in the appli-
cation is known, the above benchmarks may give some guidance in choosing an embedding
approach. However, they do not say much about what those embedding methods can actually
represent and what they cannot represent.
Another random document with
no related content on Scribd:
quexas en vuestra presencia; no
que yo, señora, de vos me quexe
ni Dios lo quiera, que no deuo
más para que las pasiones que
con mis deseos me aquexan
sepays, por merito de las quales
os suplico que no medido lo que
yo en respecto vuestro me
merezco, mas considerado lo que
por haueros visto e desear ser
vuestro padezco, por tal señora
me acepteys; no para dar más
bien a mi mal de consentir que yo
señora por vuestro seruicio lo
padezca, por que ni más osaria,
señora, pedir, ni tanto me
atreueria creer merecer.

BELISENA
Muchos dias ha, Flamiano, que
conozco en tus meneos lo que el
desuario de tu pensamiento te ha
puesto en la voluntad; e no creas
que muchas vezes dello no haya
recebido enojo, e algunas han
sido que me han puesto en
voluntad de dartelo a entender,
sino que mi reputacion e
honestidad me han apartado
dello, e aun en parte el respecto
de la buena figura en que tu
discrecion hasta agora he tenido.
Mas pues que tu atreuimiento en
tal estremo te ha traydo, que en
mi presencia tu fantasia hayas
osado publicar, forçado me será
responderte, no lo que dezirte
queria segun mi alteracion, mas
segun la vanidad de tu juyzio
merece. Lo qual aunque consejo
te parezca deues tomar por
reprehension; e digo que no te
acontezca semejante
pensamiento poner en parte
differente de ti, donde no puedas
menos hazer de verte cada hora
en infinitas necessidades e al fin
sin ver cabo á lo que desseas,
que lo hayas de ver de tu vida y
de tu honrra. Mas razon seria que
primero ygualasses la medida
donde bastas llegar con el
merecer, que no que publicasses
do querrias subir con el dessear e
aun alli, segun se suele, hallarás
tarde el contentamiento que el
deseo querria.

FLAMIANO
Mis ojos, señora, que de mis
males han sido la causa, no
tuvieron juyzio más de para
miraros e ver las perficiones que
Dios en vos puso, para que
viendoos pusiesen mi corazon en
el fuego que arde; llegada alli
vuestra figura, no pudo menos
hazer de lo que ha hecho. Mi
saber no pudo ser tanto para
temer los inconuenientes de mi
daño que vuestra hermosura no
fuesse más para causallo sin
poder ser resistido. Pues llegado
aqui mi pensamiento determinose
en que lo mucho que el merecer
desyguala mi pena del desseo,
las sobras della misma son tantas
que lo yguala todo, pues que,
señora, mi intencion no os pide
mas de licencia para padescer,
que desta suerte cierto no puede
ser reprouada pues que no es
mala. Ansi que, señora, pues que
tanto la virtud y nobleza en vos
sobra, no useys comigo por el
rasero de la crueza, pues que
mudarse ya mi cuydado es
imposible. E assi de vos no quiero
consejo; remedio es el que pido
pues que no le puedo esperar
sino de vuestra mano.

BELISENA
No creas tú, Flamiano, que la
pasion o males que publicas que
sientes, a mí dellos me plega,
ante en muchas maneras dello
me pesa. Lo vno es que á mi
causa siendo en mi perjuyzio tú
los padezcas. Lo segundo que te
atreues á ponerte en ello y aun
publicarlo. De suerte que en
muchas maneras me enojas y en
más me harias plazer y servicio
que dello te dexases. Y esto seria
seruirme como dizes que
desseas; para esto que te digo,
como ya te he dicho, los
inconuenientes de mi estado y de
mi condicion y honestidad me dan
inconueniente no solo para que
como hago dello reciba mucho
enojo, mas para que tú aunque
mill vidas como dizes perdiesses
yo dellas haya de hazer ni cuenta
ni memoria. Assi que lo mejor
será que desto te apartes e en
esto me harás seruicio como
dizes que desseas y aun me
ternas haziendolo contenta; e
pues que tanto mio eres, segun
dizes, yo te mando que lo hagas,
porque quites tu vida de peligro e
aun a mí de ser enojada.

FLAMIANO
Quando, señora, la pena
verdadera de amor como es la
mia está sellada en el alma, pues
que justa razon alli la haya
puesto, en el coraçon está
imprimida de suerte que sin él e
sin ella no pueda salir de alli.
Pues ¿como quereys, señora,
que mi cuydado se mude?, que el
dia primero que os vi, dentro en
mis entrañas e coraçon quedó el
propio traslado vuestro
perfectamente esculpido, e
despues aca quantas estradas
me haueys tirado que son
infinitas, llegadas alli, el fuego que
en tal lugar hallan las funde,
porque son de oro siendo
vuestras e fundidas hallan alli
vuestra effigia e de cada vna
dellas se haze vn otra semejante.
Assi que aunque el coraçon y el
alma con las principales
sacassen, el cuerpo quedaria
lleno con tantas que de aqui a mill
años en mi sepultura se hallarian
dellas sin cuento, e aun en todos
mis huessos se hallaria vuestro
nombre escripto en cada vno.
Ansi, que señora, si quereys que
de quereros me aparte, mandad
sacar mis huessos e raer de alli
vuestro nombre, e de mis
entrañas quitar vuestra figura,
porque ya en mi está conuertido
en que si alguno me pide quien so
digo que vuestro. E si esto a
desuario se me juzgasse, mayor
lo haria quien tal quissiese juzgar,
porque no hay nayde que con mis
ojos, señora, os mire que no
conozca ser justo lo que hago; e
como ya he dicho, aunque en la
razon mia encobrir lo quisiesse no
puedo, porque el fuego de dentro
haze denunciar a la lengua la
causa. Pero pues que en vuestra
mano está matarme o darme la
vida, e pues que della teneys la
llaue, ved vos si lo podeys hazer
e ganareys la victoria del tal
vencimiento. E si con quitarme la
vida pensays acabarlo, dudolo,
porque aunque del coraçon e las
otras partes vos apartassedes
con matarme, ni mas ni menos en
el alma os quedariades, de do
jamas os podreys quitar porque
es inmortal a causa de estar vos
en ella. E si de mi se partiesse
donde agora mis passiones la
tienen presa y atormentada,
jamas de vuestra presencia se
partiria, donde con mucho
contentamiento estaria contino.
Assi que si agora estando comigo
os enoja ausente, mira que hará
entonces estando presente, e
bien sé que pues agora os
enojays por seros yo de mi grado
captiuo, que despues de yo
muerto más enojo recibireys de
vos matadora, e sola esta gloria
que de mi muerte se espera me
basta a mi para que contento
pierda la vida, pues que con ello
yo seré fuera de pena e vos con
pesar arrepentida. Podreys
señora dezir entonces que no es
vuestro el cargo sino mia la culpa
pues que yo mesmo me lo he
buscado y querido mi daño contra
vuestra voluntad. Entonces mi
alma os negará la partida
diziendo: no, no, no es ansi, que
el cargo, señora, tuyo es pues
que tan cruelmente tan mal le
trataste no pidiendote más bien
de licencia para sofrir su mal sin
ninguna offensa tuya ni más gloria
suya.
BELISENA
Si sofrirte lo que faces me
offende, oyrte lo que dizes me
perjudica y enoja; ¿qué hará
responder a la vanidad de tus
razones? Yo te he ya dicho lo que
te cumple, bastarte deue para no
esperar mas disputa en este caso
de lo que te conuiene. No delibero
mas sobre ello hablarte, porque
creo que tu discrecion te hará
determinar lo que te cumple. Los
mios vienen, quedate con Dios y
creeme haziendo lo que te tengo
dicho.

FLAMIANO
Digo, señora, finalmente que no
puedo porque ni mi voluntad a
ello no puede doblarse, ni mi
querer puede dello quitarse, e
aunque aquí tan solo de bien e
tan acompañado de pesar me
dexeis, digo que allá donde vos
vays, allá voy, y aunque vos vays,
aqui quedays donde yo quedo,
porque ni allá, ni acá, ni en
ninguna parte donde yo me halle,
nunca vuestra vista de mis ojos
se quita, sino que en mi fantasia
do quiera que esteys, do quier
que esten, los dos juntos
estamos. E si esto, señora, no
creeys, mis obras os haran dello
testigo.
Al fin la señora Belisena se partio
con Isiana e muy enojada, a lo
que mostraua, e llegó a la
compañia de los suyos. Flamiano
quedó a solas, fuesse por otra via
con el consuelo que pensar
podeys; en aquella noche todos
los caualleros cenaron con el
señor cardenal, donde se
concerto de yr venidos de la caça
a vnos baños que ocho millas de
la ciudad estan de la mar, en vn
muy hermoso lugar que Virgiliano
se llama, porque supieron que la
señora duquesa e la princesa de
Salusano con otras muchas
damas se yuan por estar alli todo
el mes de Abril, como cada año
las damas y señoras de
Noplesano acostumbran hazer.
Visto Flamiano que esta jornada
se le aparejaua conforme a su
desseo, suplicó al señor cardenal
que ordenase vn juego de cañas
para el segundo dia de pasqua
que todas las damas ya a
Virgiliano serian venidas. De lo
qual el señor cardenal, fue tan
contento que se ofrecio tener el
vn puesto con la meytad de
aquellos caualleros, desta
manera: que los de su puesto
saldrian a la estradiota vestidos
como turcos con mascaras y
rodelas turquescas, vestidos
todos de las colores que su
señoria les daria, y que jugarian
con alcanzias. E que Flamiano
tuviesse el otro puesto a la gineta
con los otros caualleros que alli
primero se hallaron en la caça. E
que ante que al puesto saliessen,
que saliessen ellos todos juntos e
començassen su juego de cañas
partidos por medio. En el qual
juego él con sus turcos llegaria
como hombre que viene de fuera,
e assi juntados ellos todos,
començarian el otro juego contra
los que en él viniessen. E ansi el
señor cardenal tomó a cargo de
suplicar a la señora princesa que
para aquella noche conbidase a la
señora duquesa e á Belisena, con
todas las otras damas que alli se
hallassen, para que en su posada
aquella noche passado el juego
todas cenassen y alli hiziessen la
fiesta. Pues acabada la caça,
dende a dos dias con mucho
plazer los vnos e los otros todos
juntos a la ciudad se tornaron.
Donde despues de llegados,
Flamiano acordo de enbiar a
Felisel a visitar a Vasquiran con el
qual acordo respondelle a su
carta. E despachado que le houo,
Felisel se partio, e llegado a
Felernissa donde halló a
Vasquiran, despues de hauer
hablado mucho con él en especial
de las cosas dela caça e lo que
en ella se era seguido, la carta de
Flamiano le dió, la qual en esta
manera razonaua.

CARTA DE FLAMIANO Á
VASQUIRAN EN RESPUESTA
DE LA SUYA POSTRERA
No quiero, Vasquiran, dexarme de
responder a tus cartas e quexas,
si quiera porque no pienses que
razon me falta para ello, como a ti
crees que te sobra para lo que
hazes. Yo, si bien me entiendes,
no digo que de la muerte de
Violina no te duelas como es
razon que lo hagas, mas que los
estremos dexes e apartes de ti,
pues que in genere son
reprobados; porque como ya te
he dicho y tú dizes, tus lastimas
todas la muerte las ha causado, y
en verdad al parecer estas son
las mas crudas de sofrir, y al ser
las mas leues de conortar, pues
como dicho tengo, el tiempo e la
razon naturalmente las madura e
aplaca de tal suerte que assi
como la carne muerta en la
sepultura se consume, assi el
dolor que dexa en la viua se
resfria. Porque si assi no fuesse,
muchas madres que
ardientemente los hijos aman e
los pierden, por ser fragiles para
soffrir el dolor con la braueza dél,
con la flaqueza de la complision,
si este remedio el tiempo
naturalmente no les pusiesse, las
mas dellas del seso o de la vida
vernian a menos, e aun algunos
padres lo mismo harian, e otras
muchas personas que de
conjunto amor contentos
acompañados viuian como tú
hazias. Empero como he dicho el
natural remedio lo remedia
continuamente, e donde este
faltasse o si assi no fuese, digo
que por razon más obligado
serias segun quien eres a hazer
lo que digo que lo que hazes, por
muchas causas que ya te tengo
dichas, porque como sabes, la
estremidad del plañir nace de la
voluntad, la virtud del soffrir es
parte de la razon.
Pues mira quan grande es,
nuestra differencia entre la
voluntad é la razon. Lo vno parte
de discrecion e cordura; lo otro o
es o está a dos dedos de locura,
en especial que los virtuosos
varones más son conocidos en
las aduersidades por su buen
seso e sofrimiento que no en las
prosperidades por grandezas ni
gouierno; porque lo vno muchos
respectos lo pudieron causar para
hazerse, lo otro sola virtud lo
templa para sofrirse. Assi que por
todas las partes verás que por
fuerça tu dolor ha de menguar.
Mas ¿qué hare yo que si sola vna
vez que vi a la que mi mal ordena,
de tantos malos me fue causa?
en las otras que la veo ¿qué
puedo sentir? Su ausencia me
atormenta de passion; su
presencia me condena de temor;
su condicion e valer me quitan
esperança; mi suerte y ventura
me hazen desconfiar. Mi pena me
da congoxa incomportable. Lo
que siento me haze dessear la
muerte; remedio en mi no le hay;
della no se espera. E assi tengo
más aparejado el camino de
desesperar que abierta la puerta
de esperança para ningun bien.
Assi que por Dios te ruego que
comiences á poner consuelo en ti,
porque puedas presto con tu
compañia venir a poner remedio
en mí, y con tal confiança me
quedo cantando este villancico
que a mi proposito haze y a mi
pesar he hecho.

Yo consiento por seruiros


mi muerte sin que se sienta
vos señora no contenta.
El primer dia que os vi
tan mortal fue mi herida
que en veros me vi sin vida
y el viuir se vio sin mi,
pues que en viendoos
consenti
mis males que son sin cuenta,
vos señora mal contenta.
Consenti verme sin ella
solamente por miraros
y por solo dessearos
tuue por bueno perdella;
y más que los males della
quise qu'el alma los sienta
y vos dello descontenta.
Consenti que mi tormento
tan secreto fuese y tal,
que el menor mal de mi mal
diesse muerte al sentimiento;
quise más qu'el soffrimiento
que lo suffra y lo consienta
por hazeros más contenta.
De suerte que mis sospiros
aunque sean sin compas
los quiero sin querer mas
de quereros y seruiros,
sin más remedio pediros
de la muerte que m'afrenta
que veros della contenta.

LAS COSAS QUE VASQUIRAN


CONTO A FELISEL
DESPUES DE LEYDA LA
CARTA, QUE LE HAUIAN
SEGUIDO YENDO A CAÇA
Despues de leyda Vasquiran la
carta que Felisel le dió, hablando
de muchas cosas Felisel le conto
todas las cosas de la caça, assi
de los caualleros y damas que en
ella fueron como de los atauios
que todos sacaron, e aun parte de
lo que su señor con Belisena
passó hablandose con ella a
solas. Pues hauiendolo todo muy
bien relatado, otro dia
paseandosse los dos como otras
vezes solian por vna sala,
Vasquiran le començo á dezir:
Pues que ayer, Felisel, me
contaste todos los mysterios de la
caça que allá haueys tenido, e
aun lo que a tu señor en ella le
siguio, quiero contarte lo que a mi
en otra me ha acontecido.
Flamiano, como dizes, fue por
acompañar a quien de
enamorados pensamientos
acompañado le tiene e aun por
dar con su vista descanso a sus
ojos. Yo por acompañar a mi
soledad de mas soledad e por dar
a los mios con ella de lagrimas
más compañia con menos atauios
e mas angustias la semana
passada tambien me fuy á caça,
en la qual me acontecio lo que
agora oyras.

RECUENTA VASQUIRAN Á
FELISEL LO QUE LE
ACONTECIO EN LA CAÇA, E
LA OBRA QUE SOBRE ELLO
HIZO
Estando con sus canes estos mis
seruidores en sus paradas
puestos como yo los hauia
dexado, contecio que vn ciervo e
vna cierva juntos en la vna dellas
dieron, de que dadas laxas a los
perros començaron a seguirlos
por vna llanura que entrellos e un
bosque se hazia. E siendo los
canes muy buenos dieronles vn
alcance en el cual la cierua se
houo de apartar de su compañia e
vino a dar donde yo estaua, por
su desventura e la mia, e assi
como yo la vi venir salile por el
traues adelante e ante que al
bosque llegasse la maté.
Llegados alli parte destos mis
seruidores, porque ya era algo
tarde mandela cargar sobre vna
azemila con la otra caça que
muerto hauiamos, y yo comence
a venirme la via de aquella
eredad mia a donde la otra vez
me hallaste, e seyendo ya al
aquanto del bosque alongados,
sentimos los mayores bramidos
del mundo, los quales por nos
oydos, paramos por saber qué
podria ser, e vimos venir vn cieruo
que en el bosque se nos era
entrado bramando, y era el que
en compañia de la cierua venia, el
qual ni por el temor de los canes
que al encuentro le salieron, ni
por lo que los mios le ocuparon
jamas dexó de hazer su via hasta
llegar al azemila do la cierua
venia cargada. E como yo lo vi
pense lo que podia ser como fue,
aunque milagro parezca, e assi
mandé que ninguno le hiziesse
daño. Pues llegado que fue do su
dolor lo guiaua, començo á dar de
nuevo muy mayores bramidos
derramando de los ojos infinitas
lagrimas. Como tal le vi hazer
tanto dolor, començo a refrescar
en mi llaga, que temiendo en mi
algun desmayo que afrenta me
hiziesse, mandé lo dexassen
estar e segui mi camino para
donde él yva, mas como nos vido
partir, con mayores gemidos
començo a seguirnos hasta llegar
do yo yva, de donde jamas se es
partido. Como esto vi mandé que
a la cierua desollassen el cuero e
lo hinchiessen de feno e dentro
en el jardin lo colgassen en vna
lonja que en el hay tan alto que el
ciervo solamente pudiesse
alcançar a su cabeça. E desde
aquel dia que alli lo pusieron
mandé meter dentro al cieruo e
jamas de donde la cierua está se
es partido, saluo cuando
costreñido de la hambre algun
poco por la huerta a pacer se
aparta. Pusome tanta tristeza ser,
Felisel, lo que te he contado, que
despues de hauer cenado a solas
retraydo en mi camara,
veniendome a la memoria todas
mis glorias pasadas y la congoxa
presente, juzgando por lo que
este irracional hazia lo que de
razon yo deuia hazer, con infinitas
lagrimas comence contra mí
maldiziendo mi desuentura a dezir
infinitas e muy lastimeras
palabras, tantas que largo seria
contarlas. Saluo que estando assi
yo me senti assi venir a menos el
sentido e no sé si trasportado del
juyzio o si de dolor y del sueño
vencido, yo vi en vision todas las
cosas que a tu amo embio dentro
en una carta que le tengo ya
escripta, lo qual verás en versos
rimados conpuestos más como
supe que como deuiera o
quisiera. E despues hize sobre
este caso deste cieruo esta
cancion, la qual no he querido
que tu amo la vea, por que no
halle en ella con que responder a
mi carta como suele.

¿Que dolor puedo quexar


de mis angustias e males
viendo que los animales
mayor sienten mi pesar?
Quexaré de mi dolor
que es tan crudo su tormento
que vn bruto sin sentimiento
le siente mucho mayor,
de pesar que yo le siento,
mas no se puede ygualar
con mis angustias mortales
porque ell alma de mis males
mayor siente mi pesar.

Acabado que houo de decirle la


cancion le dixo: Felisel, yo querria
que mañana te partiesses, porque
llevasses a Flamiano vn cauallo
mio de la gineta con vn gentil
jaez, que agora poco ha me han
traydo de España, porque
aproueche para el, pues que a mí
ya seruir no me puede. Querria
que llegasses a tiempo que para
el juego de las cañas que me has
dicho le siruiesse. Otro dia
recebido Felisel el cauallo e la
carta se partio. E llegado a
Noplesano, halló que Flamiano
con todos los caualleros eran ya
partidos para Virgiliano, porque la
señora duquesa e la princesa con
todas las damas ya estauan alli.
Donde otro dia Felisel llegó, con
el qual Flamiano holgó mucho e
houo mucho plazer de oyrle
contar lo que a Vasquiran hauia
acontecido e tambien con el
cauallo que era muy bueno y el
jaez muy rico, en especial
llegando a tal tiempo. Y recebida
la carta començola a leer la qual
assi dezia.

CARTA DE VASQUIRAN Á
FLAMIANO EN RESPUESTA DE
LA SUYA
Quanto mejor seria, Flamiano,
que a esta question pusiessemos
silencio que no proseguirla, pues
que tan poco prouecho a los dos
nos acarrea. Tú me dizes que no
me reprueuas porque de mi mal
me duelo pues que es razon que
lo haga, sino que no deuo tanto
en estremo dolerme. Mi mal
quisiera yo que limitaras que no
fuera tan grande, que mi tristeza
pequeña es para con él. Dizes
que como la carne muerta en la
sepultura se consume, assi el
dolor que dexa en la viua se
resfria; falso es esse argumento
pues en mi que lo prueuo por el
contrario lo veo. Tornasme a
alegar las mugeres que perderian
el sentido si por esto no fuesse. A
la fe por ser ellas flacas de
sentido e fragiles pierden dello la
memoria, que no por lo que dizes.
Si honesto me fuesse alegarte
cosas de nuestra fe, vna cosa te
diria de la que no tuvo par, que en
tal caso hizo, con que callasses.
Tambien me alegas como
philosopho lo que de la voluntad o
de la razon parte, quál es auto
mas virtuoso, e das lexos del
terrero, que los que desso han
glossado, en especial Juan de
Mena e muchos no ponen
contraste en tal caso, entre la
voluntad e la razon, saluo de
aquellos apetitos que
viciosamente muestra naturaleza,
desseo voluntario, que el dolerse
nadie de la cosa amada de puro
amor e gratitud y contentamiento
que le tenia, le parte viendola
perdida. Pues estos autos
virtuosos y razonables son, que
no voluntad voluntaria. Ansi que
no te cale philosophia comigo que
poco te aprouecharia ni a
Aristoteles si mi mal sintiera. Mas
sabía el Petrarca que no tú ni yo,
mas ya sabes lo que respondio
siendo juzgado porque a cabo de
veynte años que madama Laurea
era muerta la plañia e la seruia,
quando dixo: ¿Que salud dió a mi
herida quebrarse la cuerda del
arco? Nunca de tu mal vi ningun
martir e del mio verás todas las
poesias y escripturas dende que
el mundo se començo hasta
agora llenas, de lo que aun la
sangre del martir Garcisanchez
viua tenemos e no oluidada la del
mesmo Petrarca que te he dicho,
sin otros infinitos que dellos no se
escriue. Tú no hallas remedio
para ti que cada dia hablas o
puedes hablar a quien te pena;
quieresle hallar para mi que no le
tengo. Tambien me dizes que la
primera vista tanto tanto mal te
causó, ¿que sentiras en las
otras? Digo que la primera vez te
enamoró, las otras te
reenamoran, todo el mal que te
causa su ausencia es desseo de
verla. El que te haze su presencia
es desseo de codiciarla. En fin,
son vanidades que la vna con la
otra se texen; mas si lo quieres

You might also like