Professional Documents
Culture Documents
Book Lisp
Book Lisp
Uses the Free Editions of Franz Common Lisp and AllegroGraph Mark Watson Copyright 2010 Mark Watson. All rights reserved. This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works Version 3.0 United States License.
November 3, 2010
Contents
Preface 1. Getting started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Portable Common Lisp Code Book Examples . . . . . . . . . . . . 3. Using the Common Lisp ASDF Package Manager . . . . . . . . . . 4. Information on the Companion Edition to this Book that Covers Java and JVM Languages . . . . . . . . . . . . . . . . . . . . . . . . . 5. AllegroGraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. Software License for Example Code in this Book . . . . . . . . . . xi . xi . xii . xii . xiii . xiii . xiv 1 1 3 3 4 4 7 7 8 9 10 10 11 11 13 14 14 15 16 19
1. Introduction 1.1. Who is this Book Written For? . . . . . . . . . . . . . . . . . . . . . 1.2. Why a PDF Copy of this Book is Available Free on the Web . . . . . 1.3. Book Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4. Why Graph Data Representations are Better than the Relational Database Model for Dealing with Rapidly Changing Data Requirements . . . . 1.5. What if You Use Other Programming Languages Other Than Lisp? . . 2. AllegroGraph Embedded Lisp Quick Start 2.1. Starting AllegroGraph . . . . . . . . . . . . . . . . . . . . . . . . 2.2. Working with RDF Data Stores . . . . . . . . . . . . . . . . . . . . 2.2.1. Creating Repositories . . . . . . . . . . . . . . . . . . . . . 2.2.2. AllegroGraph Lisp Reader Support for RDF . . . . . . . . . 2.2.3. Adding Triples . . . . . . . . . . . . . . . . . . . . . . . . 2.2.4. Fetching Triples by ID . . . . . . . . . . . . . . . . . . . . 2.2.5. Printing Triples . . . . . . . . . . . . . . . . . . . . . . . . 2.2.6. Using Cursors to Iterate Through Query Results . . . . . . . 2.2.7. Saving Triple Stores to Disk as XML, N-Triples, and N3 . . 2.3. AllegroGraphs Extensions to RDF . . . . . . . . . . . . . . . . . . 2.3.1. Examples Using Triple and Graph IDs . . . . . . . . . . . . 2.3.2. Support for Geo Location . . . . . . . . . . . . . . . . . . 2.3.3. Support for Free Text Indexing . . . . . . . . . . . . . . . . 2.3.4. Comparing AllegroGraph With Other Semantic Web Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4. AllegroGraph Quickstart Wrap Up . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
. 20 . 21
iii
Contents
I.
23
25 27 30 30 31 31 32 33 33 34 36 37 37 38 38 41 41 44 46 46 46 47 49 49 50 50 51 51
3. RDF 3.1. RDF Examples in N-Triple and N3 Formats 3.2. The RDF Namespace . . . . . . . . . . . . 3.2.1. rdf:type . . . . . . . . . . . . . . . 3.2.2. rdf:Property . . . . . . . . . . . . . 3.3. Dereferenceable URIs . . . . . . . . . . . . 3.4. RDF Wrap Up . . . . . . . . . . . . . . . . 4. RDFS 4.1. Extending RDF with RDF Schema 4.2. Modeling with RDFS . . . . . . . 4.3. AllegroGraph RDFS++ Extensions 4.3.1. owl:sameAs . . . . . . . . 4.3.2. owl:inverseOf . . . . . . . 4.3.3. owl:TransitiveProperty . . 4.4. RDFS Wrapup . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
5. The SPARQL Query Language 5.1. Example RDF Data in N3 Format . . . . 5.2. Example SPARQL SELECT Queries . . . 5.3. Example SPARQL CONSTRUCT Queries 5.4. Example SPARQL ASK Queries . . . . . 5.5. Example SPARQL DESCRIBE Queries . 5.6. Wrapup . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
6. RDFS++ and OWL 6.1. Properties Supported In RDFS++ . . . . . . . . 6.1.1. owl:sameAs . . . . . . . . . . . . . . . 6.1.2. owl:inverseOf . . . . . . . . . . . . . . 6.1.3. owl:TransitiveProperty . . . . . . . . . 6.2. RDF, RDFS, and RDFS++ Modeling Wrap Up
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
53
55 55 56 56 58
iv
Contents 8.2. 8.3. 8.4. 8.5. 8.6. Inferring New Triples: rdf:type vs. rdfs:subClassOf Example Using Inverse Properties . . . . . . . . . . . . . . . . . . . Using the Same As Property . . . . . . . . . . . . . . . . . Using the Transitive Property . . . . . . . . . . . . . . . . . Wrap Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 61 63 63 65 67
11. Common Lisp Client Library for Open Calais 11.1. Open Calais Web Services Client . . . . . . . . 11.2. Storing Entity Data in an RDF Data Store . . . 11.3. Testing the Open Calais Demo System . . . . . 11.4. Open Calais Wrap Up . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
12. Common Lisp Client Library for Natural Language Processing 85 12.1. KnowledgeBooks.com Natural Language Processing Library . . . . . 85 12.2. KnowledgeBooks Natural Language Processing Library Wrapup . . . 87 13. Common Lisp Client Library for Freebase 89 13.1. Overview of Freebase . . . . . . . . . . . . . . . . . . . . . . . . . . 89 13.2. Accessing Freebase from Common Lisp . . . . . . . . . . . . . . . . 91 13.3. Freebase Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 14. Common Lisp Client Library for DBpedia 14.1. Interactively Querying DBpedia Using the Snorql Web Interface . . 14.2. Interactively Finding Useful DBpedia Resources Using the gFacet Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3. The lookup.dbpedia.org Web Service . . . . . . . . . . . . . . . . . 14.4. Using the AllegroGraph SPARQL Client Library to access DBpedia 14.5. DBpedia Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 . 95 . . . . 97 97 99 100
15. Library for GeoNames 101 15.1. Using the cl-geonames Library . . . . . . . . . . . . . . . . . . . . . 101 15.2. Geonames Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Contents
105
16. Semantic Web Portal Back End Services 107 16.1. Implementing the Back End APIs . . . . . . . . . . . . . . . . . . . 108 16.2. Unit Testing the Backend Code . . . . . . . . . . . . . . . . . . . . . 110 16.3. Backend Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 17. Semantic Web Portal User Interface 17.1. Portable AllegroServe . . . . . . . . . . . 17.2. Layout of CLP les for Web Application . 17.3. Common Lisp Code for Web Application 17.4. Web Application Wrap Up . . . . . . . . 113 113 113 114 118
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
vi
List of Figures
1.1. Example Semantic Web Application . . . . . . . . . . . . . . . . . . 2
14.1. DBpedia Snorql Web Interface . . . . . . . . . . . . . . . . . . . . . 96 14.2. DBpedia Graph Facet Viewer . . . . . . . . . . . . . . . . . . . . . . 98 14.3. DBpedia Graph Facet Viewer after selecting a resource . . . . . . . . 98 17.1. Example Semantic Web Application Login Page . . . . . . . . . . . . 114 17.2. Example File Upload Page . . . . . . . . . . . . . . . . . . . . . . . 116 17.3. Example Application Search Page . . . . . . . . . . . . . . . . . . . 118
vii
List of Tables
13.1. Subset of Freebase API Arguments . . . . . . . . . . . . . . . . . . . 90
ix
Preface
This book is primarily intended to be a practical guide for using RDF data in information processing, linked data, and semantic web applications using the Common Lisp APIs for the AllegroGraph product. A second use for this book is to help you, the reader, set up an interactive Lisp development environment for writing knowledge intensive applications. So, while the Semantic Web applications using the AllegroGraph RDF data store is the main theme in this book, I will also cover using data from a variety of sources like Freebase, DBpedia and other public RDF repositories, use of statistical Natural Language Processing (NLP), and the GeoNames database and public web service.1
1. Getting started
I expect you to install the Franz Free Edition Common Lisp environment with the Free Edition of AllegroGraph to work through the examples in this book. If you own professional or enterprise licenses for these projects you can use that also. The free editions have restrictions on their use so read the license agreement. Download the Free Lisp Edition and then install AllegroGraph using: (require :update) (system.update:install-allegrograph) Whenever you start Lisp using evaluate the following form to load AllegroGraph: (require :agraph) I do not duplicate information in this book that appears in the documentation available on the Franz web site. I urge you to read how to set up an Emacs development environment.
1 The
geonames.org web service is limited to 2000 queries per hour from any single IP address. Commercial support is available, or, with some effort, you can also run GeoNames on your own server.
xi
Preface
the open source Portable AllegroServe and split-sequence libraries. example program that puts entities found in text into an RDF data store requires the AllegroGraph library. 4 Requires the Franz Portable AllegroServe client library. This is open source, but you will need to manually install Portable AllegroServe if you are using an alternative Common Lisp implementation (e.g., SBCL or Clozure Common Lisp). 5 Requires the AllegroGraph SPARQL client APIs but this code could be rewritten. 6 Requires several open source libraries that are included in the ZIP le for this books examples: cl-json, s-xml, split-sequence, usocket, trivial-gray-streams, exi-streams, chunga, cl-base64, puri, drakma, and cl-geonames.
xii
4. Information on the Companion Edition to this Book that Covers Java and JVM Languages When you browse through the directories containing the examples in this book you will notice that I use the convention of placing short snippets of test code that I often include in the book text in les named test.lisp and put longer examples in les named example.lisp. This book is organized in layers: 1. Quick introduction to the AllegroGraph and Franz Lisp. 2. Theory (with some AllegroGraph-specic short examples). 3. Detailed treatment of AllegroGraph APIs. 4. Development of Useful Common Lisp libraries information processing, and importing linked data sources like Freebase and Open Calais data to an AllegroGraph RDF store. 5. Development of a complete web portal using Semantic Web technologies.
4. Information on the Companion Edition to this Book that Covers Java and JVM Languages
This book has a companion edition that covers the use of both AllegroGraph and the open source Sesame project using JVM based languages like Java, Clojure, JRuby, and Scala. If you primarily work with JVM languages then you will likely be better off working through the other edition of this book. The JVM edition of this book offers some portability: I provide the basic functionality of AllegroGraph for RDF storage, SPARL queries, Geo Location, and free text search using the Sesame RDF data store and my own search and Geo Location libraries. The Free Java Edition of AllegroGraph does not have the commercial use restrictions that the Free Lisp Edition does. If you want a free commercial friendly Lisp development and deployment environment: I recommend that you use the companion book with the Clojure programming language.
5. AllegroGraph
AllegroGraph is written in Common Lisp and comes bundled in several different products:
xiii
Preface 1. As a standalone server that supports Lisp, Ruby, Java, Clojure, Scala, Common Lisp, and Python clients. A free version (limited to 50 million RDF triples - a large limit) that can be used for any purpose, including commercial use. 2. The WebView interface for exploring, querying, and managing AllegroGraph triple stores. WebView is standalone because it contains an embedded AllegroGraph server. 3. The Gruff for exploring, querying, and managing AllegroGraph triple stores using table and graph views. Gruff is standalone because it contains an embedded AllegroGraph server. 4. A library that is used embedded in Franz Common Lisp applications. A free version is available (with some limitations) for non-commercial use. This book uses AllegroGraph in embedded mode because this is the most agile way to experiment and learn the APIs. What you learn in this book is applicable to all of the AllegroGraph products. Currently, the server release is version 4.0 and the embedded library release is 3.3.
your convenience, I include in the code ZIP le third party libraries, most of which are released under MIT, BSD, Lisp LGPL, or Apache licenses. 8 downloading the free PDF from http://markwatson.com/opencontent does not give you the rights to this waiver.
xiv
1. Introduction
Franz has good online documentation1 for all of their AllegroGraph products. One purpose of this book is to provide a brief introduction to AllegroGraph but I assume that you also reference the documentation on the Franz web site. The broader purpose of this book is to provide application programming examples using AllegroGraph and Linked Data sources on the web. This book also covers some of my own open source Common Lisp projects that you may nd useful for Semantic Web applications. The combination of interactive Lisp development with embedded AllegroGraph and my utilities covered later should provide you with an agile development environment for writing knowledge based and semantic web applications. AllegroGraph is an RDF data repository that can use RDFS and RDFS+ inferencing. AllegroGraph also provides three non-standard extensions: 1. Text indexing and search 2. Geo Location support 3. Network traversal and search for social network applications
1. Introduction
RDF Reository
Application Program
RDF/RDFS/OWL APIs
There are many books, good tutorials and software about the Semantic Web on the web. However, there is not a single reference for developers who want to use the combination of Common Lisp and AllegroGraph for development using technologies like RDF/RDFS/OWL modeling, descriptive logic reasoners, and the SPARQL query language. If you own a Franz Lisp and AllegroGraph development license, then you are set to go. If not, you need to download and install a free edition copy at: http://www.franz.com/downloads/ You may also want to download and install the free versions of the AllegroGraph standalone server, Gruff, and WebView.2 Franz Inc. has provided support for my writing this book in the form of technical reviews and my understanding is that even though you will need to periodically refresh your free non-commercial license, there is no inherent time limit for non-commercial use. I would also like to thank Franz for providing me with an Enterprise developers license for my MacBook that I use for my own research and development projects.
2 I do not use these associated products in this book but I do in the Java,
of this book.
1.2. Why a PDF Copy of this Book is Available Free on the Web
1.2. Why a PDF Copy of this Book is Available Free on the Web
As an author I want to earn a living writing and have many people read and enjoy my books. By offering for sale the print version of this book I can earn some money for my efforts and also allow readers who can not afford to buy many books or may only be interested in a few chapters to read it from my web site. If you support my future writing projects by purchasing either the print or PDF version of this book, I thank you by offering you more exibility in the software license terms for the example programs and libraries I developed (see Section 6 in the Preface). Please note that I do not give permission to post the PDF version of this book on other peoples web sites: I consider this to be at least indirectly commercial exploitation in violation the Creative Commons License that I have chosen for this book.
1. Introduction 11. test data - miscellaneous test data les 12. utils - third party libraries3 that I use for the book examples 13. web app - both backend code from Chapter 16 and the front end web application code from Chapter 17
1.4. Why Graph Data Representations are Better than the Relational Database Model for Dealing with Rapidly Changing Data Requirements
When people are rst introduced to Semantic Web technologies their rst reaction is often something like, I can just do that with a database. The relational database model is an efcient way to express and work with slowly changing data models. There are some clever tools for dealing with data change requirements in the database world (ActiveRecord and migrations being a good example) but it is awkward to have end users and even developers tagging on new data attributes to relational database tables. A major theme in this book is convincing you that modeling data with RDF and RDFS facilitates freely extending data models and also allows fairly easy integration of data from different sources using different schemas without explicitly converting data from one schema to another for reuse. You will learn how to use the SPARQL query language to use information in different RDF repositories. It is also possible to publish relational data with a SPARQL interface. 4
1.5. What if You Use Other Programming Languages Other Than Lisp?
If you are a Java programmer, you probably still want to learn about AllegroGraph because Franz distributes a free Java version of AllegroCache that can be used for any purposes (including commercial applications) the free Java version is limited to 50 million RDF triples. The Java version is a natively compiled Franz Lisp application that provides plain socket and HTTP/REST interfaces.
3 cl-json,
s-xml, split-sequence, usocket, trivial-gray-streams, exi-streams, chunga, cl-base64, puri, drakma, and cl-geonames 4 The open source D2R project provides a wrapper for relational databases that provides a SPARQL query interface.
1.5. What if You Use Other Programming Languages Other Than Lisp? If you do most of your development in other languages like Ruby and Python then you can run the free server edition using the HTTP/Sesame client protocol. Sesame is a high quality batteries included Java library for Semantic Web development; the Sesame client protocol is well documented and simple to use but will not be covered here. If you use the Sesame protocol then you have the exibility of using both Franzs free server edition of AllegroGraph and Sesame which is open source with a BSD style license.
If you are a Windows user, follow the installation instructions on the AllegroGraph download web page and expect to see slight differences to the interactive example sessions that I use in this book.
This development copy of Allegro CL is licensed to: Trial User ;; Current reader case mode: :case-sensitive-lower cl-user(1): (require :agraph) AllegroGraph Lisp Edition 3.2 [built on March 16, 2009 15:05:15 t cl-user(2): (in-package :db.agraph.user) #<The db.agraph.user package> TRIPLE-STORE-USER(3): Please note that you will see many lines of output that I did not show. Here I required the :agraph package and changed the current Common Lisp package to db.agraph.user. In examples later in this book when we develop complete application examples we will be using our own application-specic packages and I will show you then what you need in general to import from db.agraph and db.agraph.user. We will continue this interactive example Lisp session in the following sections. I use interactive sessions in a command window for the examples in this book. If you are a Windows user then you will may want to alternatively try the Windows-specic IDE. I recommend that OS X, Linux, and Windows users use Emacs to develop Lisp code.2 If you run Franz Lisp in a terminal shell then I recommend that you start it using rlwrap. As an example, using OS X and Linux, I create an alias like: alias lisp=rlwrap alisp Using rlwrap lets you use the up arrow key to rerun previous commands, edit previous commands, etc.
provides their own Emacs tools: look for instructions for installing ELI. However, I also use the SLIME Emacs Lisp development tools that are compatible with all versions of Lisp that I use: Franz, SBCL, ClozureCL, and Gambit-C Scheme. Franz provides SLIME installation instructions for Franz Common Lisp
2.2. Working with RDF Data Stores While much of this book is specic to Common Lisp and AllegroGraph, the concepts that you will learn and experiment with can be useful if you also use other languages and platforms like Java (Sesame, Jena, OwlAPIs, etc.), Ruby (Redland RDF), etc. For Java developers Franz offers a Java version of AllegroGraph (implemented in Lisp with a network interface that also supports Python and Ruby clients) that I cover in the Java edition of this book.
2. AllegroGraph Embedded Lisp Quick Start xsd => http://www.w3.org/2001/XMLSchema# owl => http://www.w3.org/2002/07/owl# kb => http://knowledgebooks.com/rdfs# Here I created a new name space that has an abbreviation (or nickname) kb: and then printed out all registered namespaces. To insure data integrity be sure to call (close-triple-store) to close an RDF triple store when you are done with it. I leave the connection open because we will continue to use it in this chapter.
10
2.2. Working with RDF Data Stores The function add-triple takes three arguments for the subject, predicate, and object in a triple: TRIPLE-STORE-USER(18): (add-triple *demo-article* !rdf:type !kb:article) 1 TRIPLE-STORE-USER(19): (add-triple *demo-article* !kb:containsPerson !"Barack Obama") 2 We used a combination of a generated resource, two predicates dened in the rdf: and kb: namespaces, and a string literal to dene two triples. You notice that the function add-triple returns an integer as its value: this is a unique ID for the newly created triple.
11
2. AllegroGraph Embedded Lisp Quick Start <4: http://demo_news/12931 kb:containsPerson Barack Obama> <12931 containsPerson Barack Obama> TRIPLE-STORE-USER(24): (print-triple *triple*) <http://demo_news/12931> <http://knowledgebooks.com/rdfs#containsPerson> "Barack Obama" . <12931 containsPerson Barack Obama> Function print-triple prints a triple to standard output and returns the triple value in the short notation. We will see later in Section 2.2.6 how to create something like a database cursor for iterating through multiple triples that we nd by querying a triple store. For now we will use query function get-triples-list that returns all triples matching a query in a list. The utility function print-triples prints all triples in a list: TRIPLE-STORE-USER(27): (print-triples (list *triple*)) <http://demo_news/12931> <http://knowledgebooks.com/rdfs#containsPerson> "Barack Obama" . TRIPLE-STORE-USER(28): (print-triples (get-triples-list)) <http://demo_news/12931> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://knowledgebooks.com/rdfs#article> . <http://demo_news/12931> <http://knowledgebooks.com/rdfs#containsPerson> "Barack Obama" . When get-triples-list is called with no arguments it simply returns all triples in a data store. We can specify query matching values for any combination of :s, :p, and :o. We can look at all triples that have their subject equal to the resource we created for the demo article: TRIPLE-STORE-USER(31): (print-triples (get-triples-list :s *demo-article*)) <http://demo_news/12931> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://knowledgebooks.com/rdfs#article> . <http://demo_news/12931> <http://knowledgebooks.com/rdfs#containsPerson> "Barack Obama" . We can limit query results further; in this case we add the condition that the object must equal the value of the type !kb:article:
12
2.2. Working with RDF Data Stores TRIPLE-STORE-USER(33): (print-triples (get-triples-list :s *demo-article* :o !kb:article)) <http://demo_news/12931> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://knowledgebooks.com/rdfs#article> .
I often need to manually reformat program example text and example program output in this book. The last three lines in the last example would appear on a single line if you are following along with these tutorial examples in a Lisp listener (as you should be!). In any case, RDF triple data in the NTriple format that we are using here is free-format: a triple is dened by three tokens (each with no embedded whitespace unless inside a string literal) and ended with a period character.
TRIPLE-STORE-USER(39): (setq a-cursor (get-triples :s *demo-article*)) #<DB.AGRAPH::FILTERED-CURSOR #<DB.AGRAPH::ROW-CURSOR #<DB.AGRAPH::TRIPLE-RECORD-FILE @ #x113fd61a> ... #x11672082> @ #x1167219a> TRIPLE-STORE-USER(40): (while (cursor-next-p a-cursor) ; cursor-next returns a vector, not a triple: (print (cursor-next-row a-cursor))) <12931 type article> <12931 containsPerson Barack Obama> NIL TRIPLE-STORE-USER(41):
I usually nd it simpler to use the get-triples-list API that returns a list of results. I only use cursors when a query may return hundreds or thousands of results.
13
14
2.3. AllegroGraphs Extensions to RDF The subject, predicate, object, and graph value strings are uniquely stored in a global string table (like the symbol table a compiler uses) so that triples can more efciently store indices rather than complete strings. Storing just a single copy of each unique string also save memory and disk storage. Comparing string table indices is also much faster than storing string values.
15
2. AllegroGraph Embedded Lisp Quick Start In addition to queries based on values of subject, predicate, and object we can also lter results by specifying a value for the graph: ;; query on optional graph value: (print-triples (get-triples-list :g !"work-flow")) producing the output: <http://demo_news/12931> <http://knowledgebooks.com/rdfs#processed> "yes" . For the last three triples that we added to the triple store we used the optional :db keyword argument for function add-triple. Because we used the triple store stored in the global variable *db* using this optional keyword argument had no effect. However, it is possible to have multiple triple stores open at the same time so it can make sense to partition RDF data over multiple data stores on different servers. We will not concern ourselves in this book (except for mentioning it here) with AllegroGraphs client API functionality to access multiple distributed AllegroGraph servers. Franzs online documentation covers how to use the federation mechanism. The function add-triple returns as its value the newly created triples ID and has the side effect of adding the triple to the currently opened data store. While it is not best practice to use this unique internal AllegroGraph triple ID as a value referenced in another triple, there may be reasons in an application to store the IDs of newly created triples in order to be able to retrieve them from ID; for example: TRIPLE-STORE-USER(15): (get-triple-by-id 3) <12931 processed yes work-flow>
16
2.3. AllegroGraphs Extensions to RDF (register-namespace "g" "http://knowledgebooks.com/geo#") (create-triple-store "/tmp/geospatial-test") ;; define some locations in Verde Valley, Arizona: (defvar *locs* (("Verde_Valley_Ranger_Station" 34.7666667 -112.1416667) ("Verde_Valley_School" 34.8047596 -111.8060388) ("Sedona" 34.8697222 -111.7602778) ("Cottonwood" 34.739 -112.009) ("Jerome" 34.75 -112.11) ("Flagstaff" 35.20 -111.63) ("Clarkdale" 34.76 -112.05) ("Mount_Wilson" 35.996 -114.611) ("Tuzigoot" 34.56 -111.84))) Here I have dened a few locations in my area (in the mountains of Central Arizona) by latitude and longitude values. I will want to determine the minimum and maximum latitude and longitude in the data; the following simple map and reduce pattern does this: (defvar (defvar (defvar (defvar *min-lat* *max-lat* *min-lon* *max-lon* (reduce (reduce (reduce (reduce #min #max #min #max (mapcar (mapcar (mapcar (mapcar #cadr *locs*))) #cadr *locs*))) #caddr *locs*))) #caddr *locs*)))
The following code snippet creates and registers a new AllegroGraph Geo Spatial type based on the desired striping resolution and the minimum and maximum latitude and longitude values: ;; create a type: (setf offset 5.0) (flet ((fixup (num direction) (if (eq direction :min) (- num offset) (+ num offset)))) (setf *verde-valley-arizona* (db.agraph:register-latitude-striping-in-miles 3 :lat-min (fixup *min-lat* :min) :lat-max (fixup *max-lat* :max) :lon-min (fixup *min-lon* :min) :lon-max (fixup *max-lon* :max)))) (add-geospatial-subtype-to-db *verde-valley-arizona*)
17
2. AllegroGraph Embedded Lisp Quick Start After this setup we are ready to add latitude and longitude triples for each location: (dolist (loc *locs*) (let ((name (intern-resource (format nil "http://knowledgebooks.com/geo#a" (car loc))))) (print name) (add-triple name !g:isAt3 (longitude-latitude->upi *verde-valley-arizona* (caddr loc) (cadr loc))))) (index-all-triples :wait t) (print (count-cursor (get-triples-haversine-miles *verde-valley-arizona* !g:isAt3 -112.009 34.739 ; longitude and latitude 30.0)) ; distance in miles (dolist (distance (50.0 30.0 10.0 5.0)) (format t "%%Checking with distance = A%" distance) (let ((cursor (get-triples-haversine-miles *verde-valley-arizona* !g:isAt3 -112.009 34.739 distance))) (while (cursor-next-p cursor) (print (cursor-next-row cursor))))) In this example, I print out the number locations in the triple store within 30 miles of the location (-112.009 34.739) and a list of all locations within 50, 30, 10, and 5 miles of this same location. Here is the part of the output for distances of 10 and 5 miles: Checking with distance = 10.0 <Verde_Valley_Ranger_Station isAt3 +344559.99894-1120830.01273> <Jerome isAt3 +344500-1120636.00212> <Clarkdale isAt3 +344535.99394-1120300.01091> <Cottonwood isAt3 +344420.39424-1120032.40939> Checking with distance = 5.0
18
2.3. AllegroGraphs Extensions to RDF <Clarkdale isAt3 +344535.99394-1120300.01091> <Cottonwood isAt3 +344420.39424-1120032.40939>
19
2. AllegroGraph Embedded Lisp Quick Start !"Barack Obama") (add-triple *demo-article* !kb:containsPerson !"Bill Clinton") (add-triple *demo-article* !kb:containsPerson !"Bill Jones") The following uses the API freetext-get-ids that performs a free text search and returns all triple IDs that contain the query text; I then iterate over the results of a few additional example queries using cursors: (print (freetext-get-ids "Clinton")) (iterate-cursor (triple (freetext-get-triples (and "Bill" "Jones"))) (print triple)) (iterate-cursor (triple (freetext-get-triples "Bill")) (print triple)) (iterate-cursor (triple (freetext-get-triples (or "Jones" "Clinton"))) (print triple)) If I am not expecting many results for a text search query, then I prefer to use the API that returns all results at once in a list: (print (freetext-get-triples-list (or "Bill" "Barack"))) In this example I used the Lisp APIs for nding triples containing search terms. You will see in Chapter 5 how to use text search in SPARQL queries and in Chapter 9 I will show you how to use test search using Franzs Prolog query interface.
20
2.4. AllegroGraph Quickstart Wrap Up framework that is appropriate for applications written in Java. These alternatives have the advantage of being free to use but lack advantages of scalability and utility that a commercial product like AllegroGraph has.
21
Part I.
23
3. RDF
The Semantic Web is intended to provide a massive linked data set for use by software systems just as the World Wide Web provides a massive collection of linked web pages for human reading and browsing. The Semantic Web is like the World Wide Web in that anyone can generate any content that they want. This freedom to publish anything works for the web because we use our ability to understand natural language to interpret what we read and often to dismiss material that based upon our own knowledge we consider to be incorrect. The core concept for the Semantic Web is data integration and use from different sources. As we will soon see, the tools for implementing the Semantic Web are designed for encoding data and sharing data from many different sources. The Resource Description Framework (RDF) is used to encode information and the RDF Schema (RDFS) language denes properties and classes and also facilitates using data with different RDF encodings without the need to convert data to use different schemas. For example, no need to change a property name in one data set to match the semantically identical property name used in another data set. Instead, you can add an RDF statement that states that the two properties have the same meaning. I do not consider RDF data stores to be a replacement for relational databases but rather something that you will use with databases in your applications. RDF and relational databases solve difference problems. RDF is appropriate for sparse data representations that do not require inexible schemas. You are free to dene and use new properties and use these properties to make statements on existing resources. RDF offers more exibility: dening properties used with classes is similar to dening the columns in a relational database table. You do not need to dene properties for every instance of a class. This is analogous to a database table that can be missing columns for rows that do not have values for these columns (a sparse data representation). Furthermore, you can make ad hoc RDF statements about any resource without the need to update global schemas. We will use the SPARQL query language to access information in RDF data stores. SPARQL queries can contain optional matching clauses that work well with sparse data representations. RDF data was originally encoded as XML and intended for automated processing. In this chapter we will use two simple to read formats called N-Triples and N31 . There
1 N3
is a far better format to work with if you want to be able to read RDF data les and understand their contents. Currently AllegroGraph does not support N3 but Sesame does. I will usually use the
25
3. RDF are many tools available that can be used to convert between all RDF formats so we might as well use formats that are easier to read and understand. RDF data consists of a set of triple values: subject - this is a URI predicate - this is a URI object - this is either a URI or a literal value A statement in RDF is a triple composed of a subject, predicate, and object. A single resource containing a set of RDF triples can be referred to as an RDF graph. These resources might be a downloadable RDF le that you can load into AllegroGraph or Sesame, a web service that returns RDF data, or a SPARQL endpoint that is a web service that accepts SPARQL queries and returns information from an RDF data store. While we tend to think in terms of objects and classes when using object oriented programming languages, we need to readjust our thinking when dealing with knowledge assets on the web. Instead of thinking about objects we deal with resources that are specied by URIs. In this way resources can be uniquely dened. We will soon see how we can associate different namespaces with URI prexes this will make it easier to deal with different resources with the same name that can be found in different sources of information. While subjects will almost always be represented as URIs of resources, the object part of triples can be either URIs of resources or literal values. For literal values, the XML schema notation for specifying either a standard type like integer or string, or a custom type that is application domain specic. You have probably read articles and other books on the Semantic Web, and if so, you are probably used to seeing RDF expressed in its XML serialization format: you will not see XML serialization in this book. Much of my own confusion when I was starting to use Semantic Web technologies ten years ago was directly caused by trying to think about RDF in XML form. RDF data is graph data and serializing RDF as XML is confusing and a waste of time when either the N-Triple format or even better, the N3 format are so much easier to read and understand. Some of my work with Semantic Web technologies deals with processing news stories, extracting semantic information from the text, and storing it in RDF. I will use this application domain for the examples in this chapter. I deal with triples like: subject: a URI, for example the URL of a news article predicate: a relation like a persons name that is represented as a URI like
N3 format when discussing ideas but use the N-Triple format as input for example programs and for output when saving RDF data to les.
26
3.1. RDF Examples in N-Triple and N3 Formats <http://knowledgebooks.com/rdf/person/name>2 object: a literal value like Bill Clinton or a URI We will always use URIs3 as values for subjects and predicates, and use URIs or string literals as values for objects. In any case URIs are usually preferred to string literals because they are unique; for example, consider the two possible values for a triple object: Bill Clinton - as a string literal, the value may not refer to President Bill Clinton. <http://knowledgebooks.com/rdf/person#BillClinton> - as a URI, we can later make this URI a subject in a triple and use a relation to specify that this particular person had the job of President of the United States. We will see an example of this preferred use but rst we need to learn the N-Triple and N3 RDF formats.
like URLs, start with a protocol like HTTP that is followed by an internet domain.
3 Uniform Resource Identiers (URIs) are special in the sense that they (are supposed to) represent unique
things or ideas. As we will see in Chapter 10, URIs can also be dereferenceable in that we can treat them as URLs on the web and follow them using HTTP to get additional information about a URI. 4 We will model classes (or types) using RDFS and OWL but the difference is that an object in an OO language is explicitly declared to be a member of a class while a subject URI is considered to be in a class depending only on what properties it has. If we add a property and value to a subject URI then we may immediately change its RDFS or OWL class membership. 5 I think that there is some similarity between modeling with RDF and document oriented data stores like MongoDB or CouchDB where any document in the system can have any attribute added at any time. This is very similar to being able to add additional RDF statements that either add information about a subject URI or add another property and value that somehow narrows the meaning of a subject URI.
27
3. RDF http://knowledgebooks.com/ontology/#containsPerson The rst part of this URI is considered to be the namespace6 for (what we will use as a predicate) containsPerson. Once we associate an abbreviation like kb for http://knowledgebooks.com/ontology/ then we can just use the QName (quick name) with the namespace abbreviation; for example: kb:containsPerson Being able to dene abbreviation prexes for namespaces makes RDF and RDFS les shorter and easier to read. When different RDF triples use this same predicate, this is some assurance to us that all users of this predicate subscribe to the same meaning. Furthermore, we will see in Section 4.1 that we can use RDFS to state equivalency between this predicate (in the namespace http://knowledgebooks.com/ontology/) with predicates represented by different URIs used in other data sources. In an articial intelligence sense, software that we write does not understand a predicate like containsPerson in the way that a human reader can by combining understood common meanings for the words contains and person but for many interesting and useful types of applications that is ne as long as the predicate is used consistently. Because there are many sources of information about different resources the ability to dene different namespaces and associate them with unique URI prexes makes it easier to deal with situations. A statement in N-Triple format consists of three URIs (or string literals any combination) followed by a period to end the statement. While statements are often written one per line in a source le they can be broken across lines; it is the ending period which marks the end of a statement. The standard le extension for N-Triple format les is *.nt and the standard format for N3 format les is *.n3. My preference is to use N-Triple format les as output from programs that I write to save data as RDF. I often use either command line tools or the Java Sesame library to convert N-Triple les to N3 if I will be reading them or even hand editing them. You will see why I prefer the N3 format when we look at an example: @prefix kb: <http://knowledgebooks.com/ontology#> . <http://news.com/201234 /> kb:containsCountry "China"
6 You
have seen me use the domain knowledgebooks.com several times in examples. I have owned this domain and used it for business since 1998 and I use it here for convenience. I could just as well use example.com. That said, the advantage of using my own domain is that I then have the exibility to make this URI dereferenceable by adding an HTML document using this URI as a URL that describes what I mean by containsPerson. Even better, I could have my web server look at the request header and return RDF data if the requested content type was text/rdf
28
3.1. RDF Examples in N-Triple and N3 Formats Here we see the use of an abbreviation prex kb: for the namespace for my company KnowledgeBooks.com ontologies. The rst term in the RDF statement (the subject) is the URI of a news article. When we want to use a URL as a URI, we enclose it in angle brackets as in this example. The second term (the predicate) is containsCountry in the kb: namespace. The last item in the statement (the object) is a string literal China. I would describe this RDF statement in English as, The news article at URI http://news.com/201234 mentions the country China. This was a very simple N3 example which we will expand to show additional features of the N3 notation. As another example, suppose that this news article also mentions the USA. Instead of adding a whole new statement like this: @prefix kb: <http://knowledgebooks.com/ontology#> . <http://news.com/201234 /> kb:containsCountry "China" . <http://news.com/201234 /> kb:containsCountry "USA" . we can combine them using N3 notation. N3 allows us to collapse multiple RDF statements that share the same subject and optionally the same predicate: @prefix kb: <http://knowledgebooks.com/ontology#> . <http://news.com/201234 /> kb:containsCountry "China" , "USA" . We can also add in additional predicates that use the same subject: @prefix kb: <http://knowledgebooks.com/ontology#> .
<http://news.com/201234 /> kb:containsCountry "China" , "USA" . kb:containsOrganization "United Nations" ; kb:containsPerson "Ban Ki-moon" , "Gordon Brown" , "Hu Jintao" , "George W. Bush" , "Pervez Musharraf" , "Vladimir Putin" , "Mahmoud Ahmadinejad" . This single N3 statement represents ten individual RDF triples. Each section dening triples with the same subject and predicate have objects separated by commas and ending with a period. Please note that whatever RDF storage system we use (we will be using AllegroGraph) it makes no difference if we load RDF as XML, N-Triple, of N3 format les: internally subject, predicate, and object triples are stored in the same way and are used in the same way.
29
3. RDF I promised you that the data in RDF data stores was easy to extend. As an example, let us assume that we have written software that is able to read online news articles and create RDF data that captures some of the semantics in the articles. If we extend our program to also recognize dates when the articles are published, we can simply reprocess articles and for each article add a triple to our RDF data store using the N-Triple format to set a publication date7 . <http://news.com/2034 /> kb:datePublished "2008-05-11" . Furthermore, if we do not have dates for all news articles that is often acceptable depending on the application.
3.2.1. rdf:type
The rdf:type property is used to specify the type (or class) of a resource. Notice that we do not capitalize type because by convention we do not capitalize RDF property names. Here is an example in N3 format (with long lines split to t the page width): @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix kb: <http://knowledgebooks.com/rdf/publication#> . <http://demo_news/12931> rdf:type kb:article .
7 This
example is pedantic since we can apply XML Scehma (XSL) data types to literal string values, this could be more accurately specied as 2008-05-11@http://www.w3.org/2001/XMLSchema#date
30
3.3. Dereferenceable URIs Here we are converting the URL of a news web page to a resource and then dening a new triple that species the web page resource is or type kb:article (again, using the QName kb: for my knowledgebooks.com namespace).
3.2.2. rdf:Property
The rdf:Property class is, as you might guess from its name, used to describe and dene properties. Notice that Property is capitalized because by convention we capitalize RDF class names. This is a good place to show how we dene new properties, using a previous example: @prefix kbcontains: <http://knowledgebooks.com/rdf/contains#> . <http://demo_news/12931> kbcontains:person "Barack Obama" . I might make an additional statement about this URI stating that it is a property: kbcontains:person rdf:type rdf:Property . When we discuss RDF Schema (RDFS) in Chapter 4 we will see how to create subtypes and sub-properties.
31
3. RDF
8 AllegroGraph
also stores a unique integer triple ID and a graph ID for partitioning RDF data and to support graph operations. While using the triple ID and graph ID can be useful, my own preference is to stick with using just what is in the RDF standard.
32
4. RDFS
The World Wide Web Consortium RDF Schema (RDFS) denition can be read at http://www.w3.org/TR/rdf-schema/ and I recommend that you use this as a reference because I will discuss only the parts of RDFS that are required for implementing the examples in this book. The RDFS namespace http://www.w3.org/2000/01/rdfschema# is usually registered with the QName rdfs: and I will use this convention1 .
1 The actual namespace abbreviations that you use have no effect as long as you consistently use whatever
QName values you set for URIs in the RDF statements that use the abbreviations.
33
kb:containsCity rdfs:subPropertyOf kb:containsPlace . kb:containsCountry rdfs:subPropertyOf kb:containsPlace . kb:containsState rdfs:subPropertyOf kb:containsPlace . The last three lines that are themselves valid RDF triples declare that: The property containsCity is a subproperty of containsPlace. The property containsCountry is a subproperty of containsPlace. The property containsState is a subproperty of containsPlace. Why is this useful? For at least two reasons: You can query an RDF data store for all triples that use property containsPlace and also match triples with property equal to containsCity, containsCountry, or containsState. There may not even be any triples that explicitly use the property containsPlace. Consider a hypothetical case where you are using two different RDF data stores that use different properties for naming cities: cityName and city. You can dene cityName to be a subproperty of city and then write all queries against the single property name city. This removes the necessity to convert data from different sources to use the same Schema. In addition to providing a vocabulary for describing properties and class membership by properties, RDFS is also used for logical inference to infer new triples, combine data from different RDF data sources, and to allow effective querying of RDF data stores. We will see examples of more RDFS features in Chapter 5 when we perform SPARQL queries.
is a Franz extension to RDFS that adds some parts of OWL. I cover RDFS++ in some detail in the Lisp Edition of this book and mention some aspects of RDFS++ in Section 4.3, the Java Edition.
34
4.2. Modeling with RDFS Here is a short example on using RDFS to extend RDF (assume that my namespace kb: and the RDFS namespace rdfs: are dened): kb:Person rdf:type rdfs:Class . kb:Person rdfs:comment "represents a human" . kb:Manager rdf:type kb:Person . kb:Manager rdfs:domain kb:Person . kb:Engineer rdf:type kb:Person . kb:Engineer rdfs:domain kb:Person . Here we see the use of rdfs:comment used to add a human readable comment to the new class kb:Person. When we dene the new classes kb:Manager and kb:Engineer we make them subclasses of kb:Person instead of the top level super class rdfs:Class. We will look at examples later in that that demonstrate the utility of models using class hierarchies and hierarchies of properties for now it is enough to introduce the notation. The rdfs:domain of an rdf:property species the class of the subject in a triple while rdfs:range of an rdf:property species the class of the object in a triple. Just as strongly typed programming languages like Java help catch errors by performing type analysis, creating (or using existing) good RDFS property and class denitions helps RDFS, RDFS++, and OWL descriptive logic reasoners to catch modeling and data denition errors. These denitions also help reasoning systems infer new triples that are not explicitly dened in a triple data store. We continue the current example by adding property denitions and then asserting a triple that is valid given the type and property restrictions that we have dened using RDFS: kb:supervisorOf rdfs:domain kb:Manager . kb:supervisorOf rdfs:range kb:Engineer . "Mary Johnson" rdf:type kb:Manager . "John Smith rdf:type kb:Engineer . "Mary Johnson" kb:supervisorOf "John Smith" . If I tried to add a triple with Mary Johnson and John Smith reversed in the last RFD statement then an RDFS inference/reasoning system could catch the error. This example is not ideal because I am using string literals as the subjects in triples. In general, you probably want to dene a specic namespace for concrete resources representing entities like the people in this example. The property rdfs:subClassOf is used to state that all instances of one class are also instances of another class. The property rdfs:subPropertyOf is used to state that
35
4. RDFS all resources related by one property are also related by another; for example, given the following N3 statements that use string literals as resources to make this example shorter: kb:familyMember rdf:type rdf:Property . kb:ancestorOf rdf:type rdf:Property . kb:parentOf rdf:type rdf:Property . kb:ancestorOf rdfs:subPropertyOf kb:familyMember . kb:parentOf rdfs:subPropertyOf kb:ancestorOf . "Marry Smith" kb:parentOf "Sam" . then the following is valid: "Marry Smith" kb:ancestorOf "Sam" . "Marry Smith" kb:familyMember "Sam" . We have just seen that a common use of RDFS is to dene additional application or data-source specic properties and classes in order to express relationships between resources and the types of resources. Whenever possible you will want to reuse existing RDFS properties and resources that you nd on the web. For instance, in the last example I dened my own subclass kb:person instead of using the Friend of a Friend (FOAF) namespaces denition of person. I did this for pedantic reasons: I wanted to show you how to dene your own classes and properties.
36
4.3. AllegroGraph RDFS++ Extensions owl:sameAs owl:inverseOf owl:TransitiveProperty We will now discuss owl:sameAs, owl:inverseOf, and owl:TransitiveProperty to complete the discussion of frequently used RDFS predicates seen earlier in this Chapter.
4.3.1. owl:sameAs
If the same entity is represented by two distinct URIs owl:sameAs can be used to assert that the URIs refer to the same entity. For example, two different knowledge sources might might dene different URIs in their own namespaces for President Barack Obama. Rather than translate data from one knowledge source to another it is simpler to equate the two unique URIs. For example: kb:BillClinton rdf:type kb:Person . kb:BillClinton owl:sameAs mynews:WilliamClinton Then the following can be veried using an RDFS++ or OWL DL capable reasoner: mynews:WilliamClinton rdf:type kb:Person .
4.3.2. owl:inverseOf
We can use owl:inverseOf to declare that one property is the inverse of another. :parentOf owl:inverseOf :childOf . "John Smith" :parentOf "Nellie Smith" . There is something new in this example: I am using a default namespace for :parentOf and :childOf. A default namespace is assumed to be application specic and that no external software agents will need access to resources dened in the default namespace. Given the two previous RDF statements we can infer that the following is also true: "Nellie Smith" :childOf "John Smith" .
37
4. RDFS
4.3.3. owl:TransitiveProperty
As its name implies owl:TransitiveProperty is used to declare that a property is transitive as the following example shows: kb:ancestorOf a rdf:Property . "John Smith" kb:ancestorOf "Nellie Smith" . "Nellie Smith" kb:ancestorOf "Billie Smith" . There is something new in this example: in N3 you can use a as shorthand for rdf:type. Given the last three RDF statements we can infer that: "John Smith" : kb:ancestorOf "Billie Smith" .
can download SwiftOWLIM or BigOWLIM at http://www.ontotext.com/owlim/ and use either as a SAIL backend repository to get OWL reasoning capability.
38
4.4. RDFS Wrapup standard distribution in the future. My advice is to start building applications with RDF and RDFS with a view to using OWL as the need arises. If you are using AllegroGraph for your application development then certainly use the RDFS++ extensions if RDFS is too limited for your applications. We have been using SPARQL in examples and in the next chapter we will look at SPARQL in some detail.
39
41
5. The SPARQL Query Language kb:containsCountry rdfs:subPropertyOf kb:containsPlace . kb:containsState rdfs:subPropertyOf kb:containsPlace . <http://yahoo.com/20080616/usa_flooding_dc_16 /> kb:containsCity "Burlington" , "Denver" , "St. Paul" ," Chicago" , "Quincy" , "CHICAGO" , "Iowa City" ; kb:containsRegion "U.S. Midwest" , "Midwest" ; kb:containsCountry "United States" , "Japan" ; kb:containsState "Minnesota" , "Illinois" , "Mississippi" , "Iowa" ; kb:containsOrganization "National Guard" , "U.S. Department of Agriculture" , "White House" , "Chicago Board of Trade" , "Department of Transportation" ; kb:containsPerson "Dena Gray-Fisher" , "Donald Miller" , "Glenn Hollander" , "Rich Feltes" , "George W. Bush" ; kb:containsIndustryTerm "food inflation" , "food" , "finance ministers" , "oil" . <http://yahoo.com/78325/ts_nm/usa_politics_dc_2 /> kb:containsCity "Washington" , "Baghdad" , "Arlington" , "Flint" ; kb:containsCountry "United States" , "Afghanistan" , "Iraq" ; kb:containsState "Illinois" , "Virginia" , "Arizona" , "Michigan" ; kb:containsOrganization "White House" , "Obama administration" , "Iraqi government" ; kb:containsPerson "David Petraeus" , "John McCain" , "Hoshiyar Zebari" , "Barack Obama" , "George W. Bush" , "Carly Fiorina" ; kb:containsIndustryTerm "oil prices" .
42
<http://yahoo.com/10944/ts_nm/worldleaders_dc_1 /> kb:containsCity "WASHINGTON" ; kb:containsCountry "United States" , "Pakistan" , "Islamic Republic of Iran" ; kb:containsState "Maryland" ; kb:containsOrganization "University of Maryland" , "United Nations" ; kb:containsPerson "Ban Ki-moon" , "Gordon Brown" , "Hu Jintao" , "George W. Bush" , "Pervez Musharraf" , "Vladimir Putin" , "Steven Kull" , "Mahmoud Ahmadinejad" . <http://yahoo.com/10622/global_economy_dc_4 /> kb:containsCity "Sao Paulo" , "Kuala Lumpur" ; kb:containsRegion "Midwest" ; kb:containsCountry "United States" , "Britain" , "Saudi Arabia" , "Spain" , "Italy" , India" , ""France" , "Canada" , "Russia" , "Germany" , "China" , "Japan" , "South Korea" ; kb:containsOrganization "Federal Reserve Bank" , "European Union" , "European Central Bank" , "European Commission" ; kb:containsPerson "Lee Myung-bak" , "Rajat Nag" , "Luiz Inacio Lula da Silva" , "Jeffrey Lacker" ; kb:containsCompany "Development Bank Managing" , "Reuters" , "Richmond Federal Reserve Bank" ; kb:containsIndustryTerm "central bank" , "food" , "energy costs" , "finance ministers" , "crude oil prices" , "oil prices" , "oil shock" , "food prices" , "Finance ministers" , "Oil prices" , "oil" .
43
44
5.2. Example SPARQL SELECT Queries WHERE { ?subject kb:containsOrganization ?object FILTER regex(?object, "University") . }
Prior to this last example query we only requested that the query return values for subject and predicate for triples that matched the query. However, we might want to return all triples whose subject (in this case a news article URI) is in one of the matched triples. Note that there are two matching triples, each terminated with a period:
PREFIX kb: <http://knowledgebooks.com/ontology#> SELECT ?subject ?a_predicate ?an_object WHERE { ?subject kb:containsOrganization ?object FILTER regex(?object, "University") . ?subject ?a_predicate ?an_object . } DISTINCT ORDER BY ?a_predicate ?an_object LIMIT 10 OFFSET 5
When WHERE clauses contain more than one triple pattern to match, this is equivalent to a Boolean and operation. The DISTINCT clause removes duplicate results. The ORDER BY clause sorts the output in alphabetical order: in this case rst by predicate (containsCity, containsCountry, etc.) and then by object. The LIMIT modier limits the number of results returned and the OFFSET modier sets the number of matching results to skip. We are nished with our quick tutorial on using the SELECT query form. There are three other query forms that I will now briey1 cover: CONSTRUCT returns a new RDF graph of query results ASK returns Boolean true or false indicating if a query matches any triples DESCRIBE returns a new RDF graph containing matched resources
1I
45
46
5.6. Wrapup
5.6. Wrapup
This chapter ends the background material on Semantic Web Technologies. The remaining chapters in this book will be examples of gathering useful linked data and using it in applications.
47
49
6. RDFS++ and OWL owl:inverseOf owl:TransitiveProperty We will now discuss owl:sameAs, owl:inverseOf, and owl:TransitiveProperty. We will see in Chapters 7 and 8 interactive examples of these predicates.
6.1.1. owl:sameAs
If the same entity is represented by two distinct URIs owl:sameAs can be used to assert that the URIs refer to the same entity. For example, two different knowledge sources might might dene different URIs in their own namespaces for President Bill Clinton. Rather than translate data from one knowledge source to another it is simpler to equate the two unique URIs. For example:
Then the following can be veried using a RDFS++ or OWL DL capable reasoner:
6.1.2. owl:inverseOf
We can use owl:inverseOf to declare that one property is the inverse of another.
There is something new in this example: I am using a default namespace for :parentOf and :childOf. A default namespace is assumed to be application specic and that no external software agents will need access to resources dened in the default namespace. Given the two previous RDF statements we can infer that the following is also true:
50
6.1.3. owl:TransitiveProperty
As its name implies owl:TransitiveProperty is used to declare that a property is transitive as the following example shows: kb:ancestorOf a owl:TransitiveProperty . "John Smith" kb:ancestorOf "Nellie Smith" . "Nellie Smith" kb:ancestorOf "Billie Smith" . There is something new in this example: in N3 you can use a as shorthand for rdf:type. Given the last three RDF statements we can infer that: "John Smith" : kb:ancestorOf "Billie Smith" .
51
Part II.
53
55
56
7.3. Lisp APIs for Queires (defvar *r1* (sparql:run-sparql " PREFIX kb: <http:://knowledgebooks.com/ontology#> SELECT ?article_uri ?city_name WHERE { ?article_uri kb:containsCity ?city_name . }" :results-format :hashes)) (dolist (result *r1*) (maphash #(lambda (key value) (format t " key: S% value: S%%" key value)) result)) The output from this code snippet is: key: |?article_uri| value: {http://news.yahoo.com/...} key: |?city_name| value: {Burlington} ;; etc. The SPARQL ASK command checks to see if a given query produces any results. The following example request a Lisp true/false return value to the question Does any article contain the city Chicago?: (sparql:run-sparql " PREFIX kb: <http:://knowledgebooks.com/ontology#> ASK { ?any_article kb:containsCity Chicago }" :results-format :boolean) There are many possible options for the :results-format keyword argument, including: :sparql-xml serializes the results as XML to output-stream :sparql-json serializes the results as JSON data to output-stream :sparql-ttl serializes the results as Turtle encoding to output-stream (Turtle is a simplied version of N3, like N-Triples with namespaces)
57
7. SPARQL Queries Using AllegroGraph APIs :hashes returns a list of hash tables (as seen in a previous example) :arrays returns a list of arrays for each results :lists returns a list of lists for each results :count returns an integer for the number of results At some loss of efciency it is sometimes useful to match string values against regular expressions; for example: (sparql:run-sparql " PREFIX kb: <http:://knowledgebooks.com/ontology#> SELECT ?article_uri WHERE { ?article_uri kb:containsPerson ?person_name . FILTER regex(?person_name, *Putin*) }" :results-format :lists) ;; output: (({http://news.yahoo.com/s/nm/20080616/ts_nm/worldleaders /}))
7.4. Wrap Up
You learned how to query RDF triples in a repository using the Lisp AllegroGraph APIs in this chapter. We only considered triples that we explicitly added to the triple store and in later chapters we will automate the collection of data from the Internet and convert it to RDF and add it to a local triple store for reuse. Much of the power of Semantic Web technologies in general and AllegroGraph in particular is the ability to use triples that are inferred from RDFS without being explicitly created. This capability is covered in the next chapter in addition to techniques for using different data sources implemented using different schemas.
58
59
8. AllegroGraph Reasoning System (apply-rdfs++-reasoner :db *db*) This function works via side effect: the specied data store is converted to support inferencing. Since the default database *db* can be assumed, this can be shortened to: (apply-rdfs++-reasoner) If you use multiple data stores at the same time you can use different inference support for each. The remainder of this chapter uses reasoning to infer1 new information.
of RDF triple stores that support RDFS, RDFS++, or OWL reasoning can implement inferred triples in different ways. One approach is to pre-calculate inferred triples using forward chaining inference; this approach is used by the Sesame library. A different approach used in Allegrograph is to infer triples at query time. The results should (hopefully) be the same.
60
8.3. Using Inverse Properties (sparql:run-sparql " PREFIX kb: <http:://knowledgebooks.com/ontology#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> ASK { kb:sam rdf:type kb:person }" :results-format :boolean) This, however, returns a Lisp false value NIL. To get what you probably thought was the expected subclass behavior we can use rdfs:subClassOf: (add-triple !kb:man !rdfs:subClassOf !kb:person) Now the last query returns a true (Lisp T) value.
61
8. AllegroGraph Reasoning System <?xml version="1.0"?> <sparql xmlns="http://www.w3.org/2005/sparql-results#"> <head> <variable name="y"/> <variable name="x"/> </head> <results> <result> <binding name="y"> <uri>http:://knowledgebooks.com/ontology#Carol</uri> </binding> <binding name="x"> <uri>http:://knowledgebooks.com/ontology#Mark</uri> </binding> </result> </results> </sparql> I nd other output formats generally easier to use; for example specifying :resultsformat :lists yields: (({Carol} {Mark})) and specifying :results-format :hashes like:
(sparql:run-sparql " PREFIX kb: <http:://knowledgebooks.com/ontology#> SELECT ?y ?x WHERE { ?y kb:wife-of ?x }" :results-format :hashes will yield:
Specifying an output format of :results-format :sparql-json yields: { "head" : { "vars" : ["y", "x"] }, "results" : { "bindings" : [
62
8.4. Using the Same As Property { "y":{"type":"uri", "value": "http:://knowledgebooks.com/ontology#Carol"}, "x":{"type":"uri", "value": "http:://knowledgebooks.com/ontology#Mark"} } ] } } Yet another output option is :results-format :alists
(add-triple !kb:Mark !kb:name !kb:"Mark Watson") (register-namespace "test_news" "http://news.yahoo.com/s/nm/200806 (add-triple !test_news:Mark !kb:height !"6 feet 4 inches") (add-triple !kb:mark !owl:sameAs !kb:test_news:Mark) (apply-rdfs++-reasoner) You will be surprised to see only the following results: (({name} {Mark Watson}) ({sameAs} {test_news:Mark})) With just RDFS++ reasoning, the height of Mark will not be inferred. It would have been using a full OWL reasoner.
63
8. AllegroGraph Reasoning System (add-triple !kb:Mark !kb:relativeOf !kb:Ron) (add-triple !kb:Ron !kb:relativeOf !kb:Julia) (add-triple !kb:Julia !kb:relativeOf !kb:Ken) And run a query like: (sparql:run-sparql " PREFIX kb: <http:://knowledgebooks.com/ontology#> SELECT ?relative WHERE { kb:Mark kb:relativeOf ?relative }" :results-format :sparql-json)) If you forget to enable reasoning then you will get results just using RDF + RDFS that you do not expect: {
"head" : { "vars" : ["relative"] }, "results" : { "bindings" : [ { "relative":{"type":"uri", "value": "http:://knowledgebooks.com/ontology#Ron" } ] } } We need to enable reasoning with: (apply-rdfs++-reasoner) and we then get all expected relatives listed: { "head" : { "vars" : ["relative"] }, "results" : { "bindings" : [
64
8.6. Wrap Up
We saw in the same as example that RDFS++ does not always make inferences that we might expect, so it is best to test reasoning that you depend on for an application with a small example. At this point, you know how to use AllegroGraph and Lisp to write your ow applications. I am going to take a short tangent in the next chapter to show you a short-hand Prolog notation for using AllegroGraph embedded in Lisp applications.
65
Prolog interface is based on Peter Norvigs Prolog implementation written in Common Lisp.
67
9. AllegroGraph Prolog Interface (select (?s ?p ?o) (q- ?s ?p ?o)) This query contains no conditions so every triple is displayed. In Prolog, terms starting with a ? are variables that will later get value bindings. The select Lisp macro is used to perform a query and return results in a convenient Lisp list notation. Each q- term in a query is used to dene variables and optionally conditions. You will see the close correspondence with SPARQL queries as we look at more examples. The output showing all N-Triples in the example data le looks like: (("http://kbsportal.com/oak_creek_flooding /" "http://knowledgebooks.com/ontology/#storyType" "http://knowledgebooks.com/ontology/#disaster") ("http://kbsportal.com/oak_creek_flooding /" "http://knowledgebooks.com/ontology/#summary" "Oak Creek flooded last week affecting 5 businesses") ("http://kbsportal.com/bear_mountain_fire /" "http://knowledgebooks.com/ontology/#storyType" "http://knowledgebooks.com/ontology/#disaster") ("http://kbsportal.com/bear_mountain_fire /" "http://knowledgebooks.com/ontology/#summary" "The fire on Bear Mountain was caused by lightening") ("http://kbsportal.com/trout_season /" "http://knowledgebooks.com/ontology/#storyType" "http://knowledgebooks.com/ontology/#sports") ("http://kbsportal.com/trout_season /" "http://knowledgebooks.com/ontology/#storyType" "http://knowledgebooks.com/ontology/#recreation") ("http://kbsportal.com/trout_season /" "http://knowledgebooks.com/ontology/#summary" "Trout fishing season started last weekend") ("http://kbsportal.com/jc_basketball /" "http://knowledgebooks.com/ontology/#storyType" "http://knowledgebooks.com/ontology/#sports")) We can rene this example query by only requesting news stories that have a summary: (select (?news_uri ?summary) (q- ?news_uri !kb:summary ?summary)) Now the results are:
68
(("http://kbsportal.com/oak_creek_flooding /" "Oak Creek flooded last week affecting 5 businesses") ("http://kbsportal.com/bear_mountain_fire /" "The fire on Bear Mountain was caused by lightening") ("http://kbsportal.com/trout_season /" "Trout fishing season started last weekend")) If we are only interested in news stories of type disaster, then we can add another condition ltering against the story type: (select (?news_uri ?summary) (q- ?news_uri !kb:summary ?summary) (q- ?news_uri !kb:storyType !kb:disaster)) Now we only get two results: (("http://kbsportal.com/oak_creek_flooding "Oak Creek flooded last week affecting 5 ("http://kbsportal.com/bear_mountain_fire "The fire on Bear Mountain was caused by /" businesses") /" lightening"))
The Franz Prolog interface tutorial and reference web pages also show examples of performing RDFS++ type inference and further Prolog techniques. Since we will not use the Prolog interface in application examples in this book I refer you to the Franz documentation if you are interested in using the Prolog interface.
69
Part III.
71
It has been a decade since Tim Berners-Lee started writing about version 2 of the World Wide Web: the Semantic Web. His new idea was to augment HTML anchor links with typed links using RDF data. As we have seen in detail in the last several chapters, RDF is encoded as data triples with the parts of each triple identied as the subject, predicate, and object. The predicate identies the type of link between the subject and the object in a RDF triple. You can think of a single RDF graph as being hosted in one web service, SPARQL endpoint service, or a downloadable set of RDF les. Just as the value of the web is greatly increased with relevant links between web pages, the value of RDF graphs is increased when they contain references to triples in other RDF graphs. In theory, you could think of all linked RDF data that is reachable on the web as being a single graph but in practice graphs with billions of nodes are difcult to work with. That said, handling very large graphs is an active area of research both in university labs and in industry. URIs refer to things, acting as a unique identier. An important idea is that URIs in linked data sources can also be dereferenceable: a URI can serve as a unique identier for the Semantic Web and if you follow the link you can nd HTML, RDF or any document type that might better inform both human readers and software agents. Typically, a dereferenceable URI is followed by using the HTTP protocols GET method. The idea of linking data resources using RDF extends the web so that both human readers and software agents can use data resources. In Tim Berners-Lees 2009 TED talk on Linked Data he discusses the importance of getting governments, companies and individuals to share Linked Data and to not keep it private. He makes the great point that the world has many challenges (medicine, stabilizing the economy, energy efciency, etc.) that can benet from unlocked Linked Data sources.
73
an example, for peoples names, addresses, etc. under a Creative Commons License at http://patterns.dataincubator.org/book/
74
10.3. Will Linked Data Become the Semantic Web? <http://knowledgebooks.com/rdf/datasource/psql/ \\ testdb/testtable/21198> For all of these examples (properties and resources) it would be good practice to make these URIs dereferenceable.
75
a line of debug printout for the response returned by the web service call.
77
11. Common Lisp Client Library for Open Calais I will not list all of the code in opencalais-lib.lisp but we will look at some of it. I start by dening two constant values, the rst depends on your setting of the environment variable OPEN CALAIS KEY: (defvar *my-opencalais-key* (sys::getenv "OPEN_CALAIS_KEY")) (defvar *PARAMS* (concatenate string "¶msXML=" (MAKE-ESCAPED-STRING "<c:params ... >.....</c:params>"))) The web services client function is fairly trivial: we just need to make a RESTful web services call and extract the text form the comment block, parsing out the named entities and their values. Before we look at some code, we will jump ahead and look at an example comment block; understanding the input data will make the code easier to follow: <!--Relations: PersonCommunication, PersonPolitical, PersonTravel Company: IBM, Pepsi Country: France Person: Hiliary Clinton, John Smith ProvinceOrState: California--> We will use the net.aserve.client:do-http-request function to make the web service call after setting up the RESTful arguments: (defun entities-from-opencalais-query (query &aux url results index1 index2 lines tokens hash) (setf hash (make-hash-table :test #equal)) (setf url (concatenate string "http://api.opencalais.com/enlighten" "/calais.asmx/Enlighten?" "licenseID=" *my-opencalais-key* "&content=" (MAKE-ESCAPED-STRING query)
78
11.1. Open Calais Web Services Client *PARAMS*)) (setf results (net.aserve.client:do-http-request url)) The value of results will be URL-encoded text and for our purposes there is no need to decode the text returned from the web service call: (setq index1 (search "terms of service.-->" results)) (setq index1 (search "<!--" results :start2 index1)) (setq index2 (search "-->" results :start2 index1)) (setq results (subseq results (+ index1 7) index2)) (setq lines (split-sequence:split-sequence #\Newline results)) (dolist (line lines) (setq index1 (search ": " line)) (if index1 (let ((key (subseq line 0 index1)) (values (split-sequence:split-sequence ", " (subseq line (+ index1 2))))) (if (not (string-equal "Relations" key)) (setf (gethash key hash) values))))) (maphash #(lambda (key val) (format t "key: S val: S%" key val)) hash) hash) Before using this utility function in the next section to fetch data for an RDF data store we will look at a simple test: (entities-from-opencalais-query "Senator Hiliary Clinton spoke with the president of France. Clinton and John Smith talked on the aiplane going to California. IBM and Pepsi contributed to Clintons campaign.") The debug printout in this call is: key: key: key: key: "Country" val: ("France") "Person" val: ("Hiliary Clinton" "John Smith") "Company" val: ("IBM" "Pepsi") "ProvinceOrState" val: ("California")
79
80
11.3. Testing the Open Calais Demo System Function get-rdf-predicate-from-entity-type is used to map string literals to specic predicates dened in the knowledgebooks.com namespace. The following function is the utility for processing the text from documents and generating multiple triples that all have their subject equal to the value of the unique URI for the original document. (defun add-entities-to-rdf-store (subject-uri text) "subject-uri if the subject for triples that this function defines" (maphash #(lambda (key val) (dolist (entity-val val) (add-triple subject-uri (get-rdf-predicate-from-entity-type key) (literal entity-val)))) (entities-from-opencalais-query text))) If documents are not plain text (for example a word processing le or a HTML web page) then applications using the utility code developed in this chapter need to extract plain text. The code in this section is intended to give you ideas for your own applications; you would at least substitute your own namespace(s) for your application.
81
11. Common Lisp Client Library for Open Calais We can print all triples in the data store: (print-triples (get-triples-list) :format :concise) and output is: <1: http://newsdemo.com/1234 kb:containsCountry France> <2: http://newsdemo.com/1234 kb:containsPerson Hiliary Clinton> <3: http://newsdemo.com/1234 kb:containsPerson John Smith> <4: http://newsdemo.com/1234 kb:containsCompany IBM> <5: http://newsdemo.com/1234 kb:containsCompany Pepsi> <6: http://newsdemo.com/1234 kb:containsState California> This example showed just adding triples generated from a single document. If a large number of documents are processed then queries like the following might be useful: (print-triples (get-triples-list :p (get-rdf-predicate-from-entity-type "Company") :o (literal "Pepsi")) :format :concise) producing this output: <5: http://newsdemo.com/1234 kb:containsCompany Pepsi> Here we identify all documents that mention a specic company.
82
11.4. Open Calais Wrap Up I consider Open Calais to be state-of-the-art in its ability to accurately determine entities in input text. In the next chapter I will show you my own Common Lisp library that I use when I do not want to depend on accessing a third party web service like Open Calais.
83
85
12. Common Lisp Client Library for Natural Language Processing (push "knowledgebooks_nlp/" asdf:*central-registry*) (asdf:operate asdf:load-op :kbnlp) (in-package :kbnlp)
If you want to use my library from a different directory location you will need to push the path of the knowledgebooks nlp directory to asdf:*central-registry*.1 The le knowledgebooks nlp/example.lisp contains sample code for using the library:
A text-object Lisp struct contains attributes for a summary of the text, human name and place name entities found in the text, part of speech tags, and topic tags describing the input text:
With most of the output not shown for brevity, here is the output after loading the le example.lisp:
#S(TEXT :URL "http://knowledgebooks.com/docs/001" :TITLE "test doc 1" :SUMMARY "Often those amendments are ..." :CATEGORY-TAGS (("news_politics.txt" 0.38268) ("news_economy.txt" 0.31182) ("news_war.txt" 0.20174)) :HUMAN-NAMES ("President Bill Clinton") :PLACE-NAMES ("Florida") :TEXT #("President" "Bill" "Clinton" "ran" ...) :TAGS #("NNP" "NNP" "NNP" "VBD" "IN" "NN" ...))
The part of speech tags are dened in my FastTag project that you can download from my Open Source web page makrwatson.com/opensource.
1 The
data le loading code depends on the exact directory name knowledgebooks nlp so do not change
it.
86
2 I provide small linguistic example data les with the example code for this book that should be sufcient
for experimenting with my library and use in most applications. For my own research and work for my consulting customers I use a large linguistic data set that takes about thirty seconds to load. 3 It takes about 3 milliseconds to process 100 words of input text on my MacBook.
87
89
Table 13.1.: Subset of Freebase API Arguments Argument Argument type Format Default value query required string type optional string /location/citytown limit optional integer 20 start optional integer 0
The best way to access Freebase is to use the MQL Query Language. However, Freebase now has an RDF interface1 and we will use this interface in this chapter. Please note that the Java, Clojure, JRuby, and Scala edition of this book wraps the Java Freebase MQL client library for full access to Freebase. You can refer to the other edition of this book for more detailed information concerning Freebase. http://rdf.freebase.com/ The RDF interface can fetch all RDF triples for a given Freebase RDF resource identier. As an example, here is the identier for the Freebase topic about me: http://rdf.freebase.com/ns/en.mark_louis_watson The returned triples are (most not shown for brevity):2 <http://rdf.freebase.com/ns/en.mark_louis_watson> <http://rdf.freebase.com/ns/people.person.date_of_birth> "1951" . <http://rdf.freebase.com/ns/en.mark_louis_watson> <http://rdf.freebase.com/ns/common.topic.alias> "Mark Watson"@en . <http://rdf.freebase.com/ns/en.mark_louis_watson> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdf.freebase.com/ns/computer.software_developer> . <http://rdf.freebase.com/ns/en.mark_louis_watson> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdf.freebase.com/ns/book.author> . <http://rdf.freebase.com/ns/en.mark_louis_watson> <http://creativecommons.org/ns#attributionName> "Source: Freebase - The Worlds database" .
1 http://blog.freebase.com/2008/10/30/introducing 2 Output
90
<http://rdf.freebase.com/ns/en.mark_louis_watson> <http://rdf.freebase.com/ns/people. \\ person.education> <http://rdf.freebase.com/ns/m.0b6_ggq> . <http://rdf.freebase.com/ns/m.0b6_ggq> <http://rdf.freebase.com/ns/education. \\ education.institution> <http://rdf.freebase.com/ns/en.university_of_california_santa_ba You can use the Freebase RDF browser3 to nd Freebase RDF resource identiers using free text search.
91
(defvar *hs* (with-output-to-string (sstrm) (json:encode *h* sstrm))) (defvar *s* (concatenate string mql-url (net.aserve.client::uriencode-string *hs*))) (defvar *str-results* (do-http-request *s*)) (format t "Results:%%A%%" *str-results*) The output is a string containing encoded JSON data and looks like: { "code": "/api/status/ok", "result": [ { "name": "Mark Louis Watson", "type": [ "/common/topic", "/people/person", "/book/author", "/computer/software_developer" ] } ], "status": "200 OK", "transaction_id": "cache;cache04.p01;2010-10-23T22" } This code snippet gets the result as Lisp data: (defvar *results* (json:parse *str-results*)) (maphash #(lambda (key val) (format t "key: A value: A%" key val)) (car (gethash "result" *results*)))
92
13.3. Freebase Wrapup The output looks like: key: name value: Mark Louis Watson key: type value: (/common/topic /people/person /book/author /computer/software_developer)
4 http://www.freebase.com/app/queryeditor
93
http://dbpedia.org/snorql
Figure 14.1 shows the DBpedia Snorql web interface showing the results of one of the sample SPARQL queries used in this section.
95
A good way to become familiar with the DBpedia ontologies used in these examples is to click the links for property names and resources returned as SPARQL query results, as seen in Figure 14.1. Here are three different sample queries that you can try: PREFIX dbo: <http://dbpedia.org/ontology/> SELECT ?s ?p WHERE { ?s ?p <http://dbpedia.org/resource/Berlin> . } ORDER BY ?name
96
14.2. Interactively Finding Useful DBpedia Resources Using the gFacet Browser PREFIX dbo: <http://dbpedia.org/ontology/> SELECT ?location ?name ?state_name WHERE { ?location dbo:state ?state_name . ?location dbpedia2:name ?name . FILTER (LANG(?name) = en) . } limit 25 The http://dbpedia.org/snorql SPARQL endpoint web application is a great resource for interactively exploring the DBpedia RDF datastore. We will look at an alternative browser in the next section.
14.2. Interactively Finding Useful DBpedia Resources Using the gFacet Browser
The gFacet browser allows you to nd RDF resources in DBpedia using a search engine. After nding matching resources you can then dig down by clicking on individual search results. You can access the gFacet browser using this URL: http://www.gfacet.org/dbpedia/ Figures 14.2 and 14.3 show a search example where I started by searching for Arizona parks, found ve matching resources, clicked the rst match Parks in Arizona, and then selected Dead Horse State Park.1
97
98
14.4. Using the AllegroGraph SPARQL Client Library to access DBpedia http://lookup.dbpedia.org/api/search.asmx/KeywordSearch? \\ QueryString=Flagstaff\&QueryClass=XML\&MaxHits=10 As you will see in Section ??, the search client needs to lter results returned from the lookup web service since the lookup service returns results with partial matches of search terms. I prefer to get only results that contain all search terms. The following sections contain implementations of a SPARQL client and a free text search lookup client. DBpedia is a mostly automatic extraction of RDF data from Wikipedia using the metadata in Wikipedia articles. You have two alternatives for using DBpedia in your own applications: using the public DBpedia SPARQL endpoint or downloading all or part of the DBpedia RDF data and loading it into your own RDF data store (e.g., AllegroGraph or Sesame). The public DBpedia SPARQL endpoint URI is http://dbpedia.org/sparql. For the purpose of the examples in this book we will simply use the public SPARQL endpoint but for serious applications I suggest that you run your own endpoint.
99
14. Common Lisp Client Library for DBpedia (((|?person| . !<http://dbpedia.org/resource/Benjamin_Franklin>)) ((|?person| . !<http://dbpedia.org/resource/Cotton_Mather>)) ((|?person| . !<http://dbpedia.org/resource/James_Spader>)) ((|?person| . !<http://dbpedia.org/resource/Christa_McAuliffe>)) ((|?person| . !<http://dbpedia.org/resource/Gordon_K_MacLeod>)) ((|?person| . !<http://dbpedia.org/resource/Edgar_Allan_Poe>))) :SELECT #(|?person|)
100
mark:lisp_practical_semantic_web markw$ cd geonames/ mark:geonames markw$ alisp CL-USER(1): :ld test CL-USER(2): (cl-geonames:geo-country-info :country ("US" "FR")) (:|geonames| (:|country| (:|countryCode| "FR") (:|countryName| "France") (:|isoNumeric| "250") (:|isoAlpha3| "FRA") (:|fipsCode| "FR") (:|continent| "EU") (:|capital| "Paris") (:|areaInSqKm| "547030. (:|population| "64768389") ...) (:|country| (:|countryCode| "US") (:|countryName| "United States" (:|isoNumeric| "840") (:|isoAlpha3| "USA") (:|fipsCode| "US") (:|continent| "NA") (:|capital| "Washington") (:|areaInSqKm| "9629091.0") (:|population| "310232863") ...))
1 The
geonames.org web service is limited to 2000 queries per hour from any single IP address. Commercial support is available, or, with some effort, you can also run GeoNames on your own server with some effort. There are, for example, a few open source Ruby on Rails projects that use the Geonames data les and provide a web service interface. 2 http://code.google.com/p/cl-geonames/ 3 Drakma, s-xml, cl-json, chunga, cl-base64, exi-streams, puri, split-sequence, trivial-gray-streams, usocket
101
CL-USER(3): (cl-geonames::geo-country-code "42.21" "-71.5") (:|geonames| (:|country| (:|countryCode| "US") (:|countryName| "United State (:|distance| "0.0"))) CL-USER(4): (cl-geonames::geo-elevation-srtm3 "42.21" "-71.5") "122" CL-USER(5): (cl-geonames::geo-country-subdivision "42.21" "-71.5 (:|geonames| (:|countrySubdivision| (:|countryCode| "US") (:|countryName| "United States") (:|adminCode1| "MA") (:|adminName1| "Massachusetts") ((:|code| :|type| "FIPS10-4") ((:|code| :|type| "ISO3166-2") "MA") (:|distance| "0.0"))) CL-USER(6): (cl-geonames::geo-find-nearby-place-name "42.21" "-7 (:|geonames| (:|geoname| (:|toponymName| "Hayden Row") (:|name| "Hayden Row" (:|lat| "42.20426") (:|lng| "-71.51062") (:|geonameId| "493915 (:|countryCode| "US") (:|countryName| "United States") (:|fcl| (:|fcode| "PPL") ...) (:|geoname| (:|toponymName| "Hopkinton") (:|name| "Hopkinton") (:|lat| "42.22358") (:|lng| "-71.52282") (:|geonameId| "725769 (:|countryCode| "US") (:|countryName| "United States") (:|fcl| (:|fcode| "PPL") ...) (:|geoname| (:|toponymName| "Hopkinton") (:|name| "Hopkinton") (:|lat| "42.22871") (:|lng| "-71.52256") (:|geonameId| "493988 (:|countryCode| "US") (:|countryName| "United States") (:|fcl| (:|fcode| "PPL") ...) (:|geoname| (:|toponymName| "Camp Bob White") (:|name| "Camp Bob White") (:|lat| "42.22648") (:|lng| "-71.46 (:|geonameId| "4932024") (:|countryCode| "US") (:|countryName| "United States") (:|fcl| "P") (:|fcode| "PPL") (:|geoname| (:|toponymName| "North Milford") (:|name| "North Mi (:|lat| "42.18343") (:|lng| "-71.53784") (:|geonameId| "494567 (:|countryCode| "US") (:|countryName| "United States") (:|fcl| (:|fcode| "PPL") ...)) CL-USER(7): (cl-geonames::geo-search "Sedona" "Sedona" "Sedona" ((:|geonames| :|style| "MEDIUM") (:|totalResultsCount| "1") (:|geoname| (:|toponymName| "Sedona") (:|name| "Sedona") (:|lat| "34.86974") (:|lng| "-111.76099") (:|geonameId| "53136 (:|countryCode| "US") (:|countryName| "United States") (:|fcl| (:|fcode| "PPL")))
102
103
Part IV.
105
1 The
part of this example using AllegroGraph is specic to Franz Lisp and AllegroGraph but what you will learn in Chapter 17 will work well with other Common Lisp implementations like SBCL.
107
108
16.1. Implementing the Back End APIs (defun add-document (doc-uri doc-title doc-text) (let* ((txt-obj (kbnlp:make-text-object doc-text :title doc-title :url doc-uri)) (resource (db.agraph.user::resource doc-uri))) (db.agraph.user::add-triple resource !rdf:type !kb:document) (db.agraph.user::add-triple resource !kb:docTitle (db.agraph.user::literal doc-title)) (db.agraph.user::add-triple resource !kb:docText (db.agraph.user::literal doc-text)) (dolist (human-name (kbnlp::text-human-names txt-obj)) (pprint human-name) (db.agraph.user::add-triple resource !kb:docPersonEntity (db.agraph.user::literal human-name))) (dolist (place-name (kbnlp::text-place-names txt-obj)) (pprint place-name) (db.agraph.user::add-triple resource !kb:docPlaceEntity (db.agraph.user::literal place-name))) (dolist (tag (kbnlp::text-category-tags txt-obj)) (pprint tag) (db.agraph.user::add-triple resource !kb:docTag (db.agraph.user::literal (format nil "A/A" (car tag) (cadr tag)))))))
Here I used my NLP library but as an exercise, you could rewrite this using the Open Calais client library I provided in Chapter 11. It can be useful having a local entity extraction library so applications do not need access to the Internet.2 The function doc-search is a simple wrapper for using the AllegroGraph text search APIs:
2 As
I write this chapter in October 2010, I am on a ship in the Pacic Ocean with a poor Internet connection.
109
16. Semantic Web Portal Back End Services (defun doc-search (search-term-string) "return a list of matching doc IDs" (db.agraph.user::freetext-get-ids search-term-string)) Search results are retuned as a list of document IDs and the following function getdoc-info can be used to reconstruct a document object from the RDF triples containing original data and semantic data for a document: (defun get-doc-info (doc-id) (let* ((parts (db.agraph.user::get-triple-by-id doc-id)) (subject (db.agraph.user::subject parts)) (predicate (db.agraph.user::predicate parts)) (object (db.agraph.user::object parts))) (mapcar #(lambda (obj) (list (db.agraph.user::part->concise (db.agraph.user::subject obj)) (db.agraph.user::part->concise (db.agraph.user::predicate obj)) (db.agraph.user::part->terse (db.agraph.user::object obj)))) (db.agraph.user::get-triples-list :s subject)))) I dont list them here, but the le backend.lisp also contains utility functions printall-docs and delete-all-docs that you might nd useful if you interactively experiment with the code in this chapter.
110
16.2. Unit Testing the Backend Code (let ((person-list (db.agraph.user::get-triples-list :p !kb:docPersonEntity)) (place-list (db.agraph.user::get-triples-list :p !kb:docPlaceEntity))) (lisp-unit:assert-equal (db.agraph.user::part->string (db.agraph.user::object (car person-list))) "\"John Smith\"") (lisp-unit:assert-equal (db.agraph.user::part->string (db.agraph.user::object (car place-list))) "\"Mexico\""))) (lisp-unit:define-test "print-triples" (print-all-docs) (lisp-unit:assert-equal t t)) (lisp-unit:define-test "good-login" (lisp-unit:assert-equal t (valid-login? "demo" "demo"))) (lisp-unit:define-test "bad-login" (lisp-unit:assert-equal nil (valid-login? "demo" "demo2"))) (run-tests) (db.agraph.user::delete-triples (db.agraph.user::delete-triples (db.agraph.user::delete-triples (db.agraph.user::delete-triples (db.agraph.user::delete-triples (db.agraph.user::delete-triples (db.agraph.user::delete-triples :o :p :p :p :p :p :p !kb:document) !kb:docText) !kb:docTitle) !kb:docPersonEntity) !kb:docPlaceEntity) !kb:docTag) !kb:doc)
bad-login: 1 assertions passed, 0 failed. create-doc-test1: 2 assertions passed, 0 failed. checking login: demo demo good-login: 1 assertions passed, 0 failed. print-triples: 1 assertions passed, 0 failed. TOTAL: 5 assertions passed, 0 failed, 0 execution errors.
111
3 In
112
113
Figure 17.1.: Example Semantic Web Application Login Page 2. browser.clp - this is the page for supporting search and content browsing 3. footer.clp - this is a page fragment that is included at the bottom of most web pages for this application 4. header.clp - this is a page fragment that is included at the top of most web pages for this application 5. left bar.clp - this page fragment implements the left hand side of the page navigation menu 6. login.clp - contains elds for a user to enter account name and password 7. upload.clp - contains an HTML form for specifying a local le to upload to the web application Figure 17.1 shows the login page.
114
17.3. Common Lisp Code for Web Application also contains all other code required for the web application. I am going to assume that you open the le webapp.lisp in your favorite text editor and read through it; here I will only list and discuss bits of code that are either non-intuitive or especially interesting. I start by requiring libraries and loading my own code:23 (eval-when (compile load eval) (require :aserve) (require :webactions) (load "backend.lisp") (load "../utils/file-utils.lisp")) (defpackage :user (:use :net.aserve :net.html.generator)) (in-package :user) The following code snippet registers WebActions controller functions with HTTP GTE and POST actions: (webaction-project "dojotest" :destination "web/" :index "login" :map (("menu" action-check-login) ("menu" "menu.clp") ("browser" "browser.clp") ("search" "/do-search") ("admin" "/upload.clp") ("about" "/about.clp") ("wiki" "/wiki.clp") ("upload" "/upload.clp") ;;("upload" "/do-upload" "/do-upload" (:redirect t)) ("gotlogin" action-got-login) ("login" "login.clp"))) The login.clp le contains an HTML form for the users account and password:4 <form id="myForm2" action="gotlogin" method="post"> User name: <input type="text" name="username" value="demo" />
2 I am loading the source les for my own utilities so they will not be natively compiled when using Franz
Lisp. Other Common Lisp implementations like SBCL will compile code that is loaded from source. note that the code snippets shown in this section are not in the same order as they appear in the source le 4 Formatting HTML is not shown.
3 Please
115
The action gotlogin checks to see if there is user data in the current request:
(defun action-got-login (req ent) (let ((session (websession-from-req req))) (let ((user (cdr (assoc "username" (net.aserve:request-query req) :test #equal))) (passwd (cdr (assoc "password" (net.aserve:request-query req) :test #equal)))) (if* (and user passwd (valid-login? user passwd)) then ; already logged in "browser" ; just go to the real home else ; must login "login"))))
116
17.3. Common Lisp Code for Web Application The WebActions controller function that handles search requests on the browser page uses the utilities in the le backend.lisp that I discussed in Chapter 16:
(defun do-search (req ent) (net.aserve:with-http-response (req ent) (net.aserve:with-http-body (req ent) (let ((test-input (cdr (assoc "input_test_form_text" (net.aserve:request-query req) :test #equal)))) (princ (format nil "AJAX: A <a href=\"/search\" target=\"new\">click here</a>" test-input) net.html.generator:*html-stream*))))) Handling le uploads is a bit tricky; I copied the code from the functions fetchmultipart-sequence and process-upload-form from the AllegroServe and WebActions documentations, adding new functionality to process-upload-form for processing the text of uploaded les and creating RDF triples in our local data store. The last code snippet from le webapp.lisp that we will look at is the code for showing search results on the browser page:
(def-clp-function search_results (req ent args body) (let ((session (websession-from-req req)) (test-input (cdr (assoc "input_test_form_text" (net.aserve:request-query req) :test #equal)))) (push test-input (websession-variable session "history")) (net.html.generator:html (:princ (format nil "Search results: A" test-input))) (if test-input (dolist (doc-id (doc-search test-input)) (princ (format nil "<pre>A</pre>%" (get-doc-info doc-id)) net.html.generator:*html-stream*))))) Figure 17.3 shows the login page.
117
5 As
6 For
well as other high quality frameworks like Edi Weitzs Hunchentoot application server. example, AllegroGraph, Sesame, Redland, Jena, 4store, the Talis Platform, and Virtuoso.
118
Index
DBpedia, 95 dereferenceable URI, 73 dereferenceable URIs, 27, 31, 89 Descriptive Logic, 35 example RDF data, 41 FOAF, 36 Freebase, 89 GeoNames web services, 101 gFacet Browser, 97 Linked Data, 27, 73 lookup.dbpedia.org Web Service, 97 N-triple, 28 N-Triple RDF serialization, 26 N3, 29, 41 N3 RDF serialization, 26 OWL, 35 QNames, 33 RDF (Resource Description Framework), 25 RDF graph, 26 RDF properties, 31 RDF statement, 26 rdf:property, 35 RDFS, 25 RDFS (RDF Schema), 33 RDFS++, 34 rdfs:domain, 35 rdfs:subClassOf, 35 rdfs:subPropertyOf, 35 URIs (Uniform Resource Identiers), 27 Snorql, 95 SPARQL, 41, 44 SPARQL endpoint, 26 Tim Berners-Lee, 73
119