Professional Documents
Culture Documents
Querying Graphs With Neo4j
Querying Graphs With Neo4j
203
Get More Refcardz! Visit Refcardz.com
What is a Graph Database? performance. Note that although they are directed,
relationships can always be navigated in both directions.
We live in a connected world. There are no isolated There is only one consistency rule in a graph database:
pieces of information, but rich, connected domains all “No broken links”. Since a relationship always has to have
around us. Only a database that embraces relationships a start and end node, you can only delete a node by also
as a core aspect of its data model is able to store, removing its associated relationships.
process, and query connections efficiently. While other
databases compute joins expensively at query time, a
graph database stores connections as first class citizens, What is Neo4j?
readily available for any “join-like” navigation operation.
Accessing those already persisted connections is an Neo4j is an open-source, NoSQL graph database
efficient, constant-time operation and allows you to implemented mainly in Java and Scala. Its development
traverse millions of relationships per second. started in 2003 and it has been sponsored by Neo
Independent of the size of the total dataset, graph Technology, Inc. since 2011. The source code and issue
databases excel at managing highly connected data and tracking is available on github.com/neo4j, with an active
complex queries. Armed only with a pattern and a set community supporting users on Stack Overflow and the
of starting points, graph databases explore the larger Neo4j Google Group.
neighborhoods around these initial entities — collecting Neo4j is used by hundreds of thousands of users in
and aggregating information from millions of nodes and almost all industries. Use cases include matchmaking,
relationships — but leaving the billions outside the search network management, impact analysis, software
perimeter untouched. analytics, scientific research, routing, organizational
The Property Graph Model and project management, content management,
recommendations, social networks, and more.
If you’ve ever worked with an object model or entity
relationship diagram, the labeled property graph model Neo4j Editions
will be familiar. Neo4j’s free Community Edition is a high-performance,
fully ACID-transactional database. It includes (but is not
Labeled Property Graph Data Model limited to) all the functionality described in this Refcard.
value-pairs). Nodes can be tagged with several labels Free Online Training
representing different roles in the domain. Besides
Learn Neo4j
putting a subset of node properties and relationships into
Java Enterprise
You can download Neo4j from neo4j.com/download. Neo4j Once you have Neo4j running (accessible at localhost:7474
Server can be installed and run on all operating systems. It or on a remote server), open the Neo4j Browser and check
provides an easy-to-use web interface at localhost:7474 out the Cypher Workbench. It provides guides to get you
started, links to the manual and a sample movie database.
The simplest way of getting started is to use Neo4j’s
database browser to execute your graph queries (written Neo4j Browser
in Cypher, the graph query language described in this
Refcard) in a workbench-like fashion. Results are rendered
as either intuitive graph visualizations or as easy-to-read
exportable tables.
Neo4j is a NoSQL
Graph Database
open source
welcoming UI your graph-query workbench
easy data modeling
readable queries Let’s add some data!
active community
In the sidebar on the left, open the Information tab (the
high performance
“i”), then choose the “Movie Graph App”. You’ll see a slide-
optional schema show explaining the dataset of Movie and Person nodes
connected via the ACTED_IN and DIRECTED relationships.
A remote Neo4j Server can be accessed via its Cypher On the second slide, a large chunk of Cypher code contains
HTTP API, either directly or through one of the many the statements to create the dataset. Click on it to transfer
available language drivers. For especially high performance the whole statement to the command line above.
use cases, you can add Neo4j Server extensions in any
Run the statement by hitting the “Execute” button on the
JVM language to access Neo4j’s internal database engine
top right. This inserts data into the database and renders a
directly without network overhead.
lonely node as a result.
• On Windows, run the installer, choose a directory and For a simple query that returns a Person node with its
start the server associated Movies, run:
MATCH (p:Person {name:"Tom Hanks"})-->(m:Movie)
• On the other platforms, unzip the file and change to RETURN p,m
the directory in a terminal, e.g. ~/Download/neo4j-
community-2.1.5 In the next step we’re going to connect to the database.
UNION SET
Command Description Command Description
CREATE INDEX
Command Description
Command Description
CREATE Create a node with the given CREATE INDEX ON Create an index on the label
(n:Person {name: {value}}) properties. :Person(name) Person and property name.
Create a node with the given An index can be automatically
CREATE (n:Person {map})
properties. used for equality comparison.
MATCH (n:Person)
CREATE Create nodes with the given Note that for example
WHERE n.name = {value}
(n:Person {collOfMaps}) properties. lower(n.name) = {value}
will not use an index.
Create a relationship with the
CREATE (n)-[r:KNOWS]->(m) given type and direction; bind an MATCH (n:Person) An index can be automatically
identifier to it. WHERE n.name IN {values} used for the IN collection checks.
MERGE CONSTRAINT
(n:Person {name: {value}}) Match pattern and create it if it
ON CREATE does not exist. Command Description
SET n.created=timestamp()
Create a unique constraint on the
ON MATCH Use ON CREATE and ON MATCH for
label Person and property name.
SET n.counter= conditional updates.
coalesce(n.counter,0)+1,
CREATE CONSTRAINT If any other node with the label
n.accessed=timestamp()
ON (p:Person) Person is updated or created
MATCH ASSERT p.name IS UNIQUE with a value for name that already
(a:Person {name:{value1}}), MERGE finds or creates a exists, the write operation will
(b:Person {name: {value2}}) relationship between the nodes. fail. This constraint will create an
MERGE (a)-[r:LOVES]->(b) accompanying index.
Operators Labels
Type Operator Command Description
(m)<-[:KNOWS]-(n)
A relationship from n to m of type Maps
KNOWS exists.
Command Description
A relationship from n to m of type
(n)-[:KNOWS|:LOVES]->(m) Literal maps are declared in curly
KNOWS or LOVES exists. {name:"Alice", age:38,
braces much like property maps.
address:{city:"London",
Bind an identifier to the Nested maps and collections are
(n)-[r]->(m) residential:true}}
relationship. supported.
Variable length paths can span 1 MERGE Maps can be passed in as
(n)-[*1..5]->(m)
to 5 hops. (p:Person{name:{map}.name}) parameters and used as map or by
ON CREATE SET p={map} accessing keys.
Variable length path of any depth.
(n)-[*]->(m)
See performance tips. range({start},{end},{step}) Range creates a collection of
AS coll numbers (step is optional).
Match or set properties in MATCH,
(n)-[:KNOWS]->(m:Label
CREATE, CREATE UNIQUE or MERGE MATCH (n:Person)-[r]->(m) Nodes and relationships are
{property: {value}})
clauses. RETURN n,r,m returned as maps of their data.
shortestPath( Find a single shortest path for Map entries can be accessed by
map.name, map.age, map.
(n1)-[*..6]-(n2)) previously matched nodes. their keys.
children[0]
Invalid keys result in an error.
allShortestPaths(
Find all shortest paths.
(n1)-[*..6]-(n2))
collect(n.property) Value collection, ignores NULL. • To use the older syntax, prepend your Cypher
statement with CYPHER 1.9.
Sum numerical values. Similar
sum(n.property)
functions are avg, min, max.
Performance Tips
Discrete percentile. Continuous
percentileDisc(n.property, percentile is percentileCont.
{percentile}) The percentile argument is from • Use parameters instead of literals when possible. This
0.0 to 1.0. allows Cypher to reuse your queries instead of having
Standard deviation for a sample
to parse and build new execution plans.
stdev(n.property) of a population. For an entire
population use stdevp.
• Always set an upper limit for your variable length
patterns. It’s easy to have a query touch all nodes in a
CASE graph by mistake.
Command Description • Return only the data you need. Avoid returning
CASE n.eyes Return THEN value from the
whole nodes and relationships — instead, return only
WHEN 'blue' THEN 1 matching WHEN value. the properties you need.
WHEN 'brown' THEN 2 The ELSE value is optional, and
ELSE 3 substituted for NULL if missing.
END
Use Case: Recommendations
CASE
Return THEN value from the first Recommendations in Neo4j are both powerful and easy
WHEN n.eyes = 'blue' THEN 1
WHEN n.age < 40 THEN 2
WHEN predicate evaluating to TRUE. to implement. You can recommend anything, including
Predicates are evaluated in order. friends, music, products, places, books, jobs, travel-
ELSE 3
END connections ... even Refcardz.
CREATE (:Author {name:"Michael Hunger"}) Now we can run a simple query that asks the following
-[:WROTE]->(neo4j:Refcard {title:"Querying Graphs Neo4j"}) question: “I really liked the Core Spring Data Refcard.
-[:FOR_SKILL]->(:Skill {level:1}), Matching my reading history and skills, what other Refcardz
(neo4j)<-[:TAGGED]-(nosql:Topic {name:"NoSQL"}),
should I read?”
(neo4j)<-[:TAGGED]-(:Topic {name:"GraphDB"})
CREATE (:Author {name:"Oliver Gierke"})
-[:WROTE]->(springData:Refcard {title:"Core Spring Data"})
MATCH (ref1:Refcard {title:"Core Spring Data"})
-[:FOR_SKILL]->(:Skill {level:2}),
<-[:TAGGED]-(t)-[:TAGGED]->(ref2:Refcard)<-[:WROTE]-(author)
(springData)<-[:TAGGED]-(:Topic {name:"Framework"}),
MATCH (ref1)-[:FOR_SKILL]->(skill1), (ref2)-[:FOR_SKILL]->(skill2)
(springData)<-[:TAGGED]-(nosql)
WHERE abs(skill1.level-skill2.level) < 2
CREATE (:Author {name:"Alex Baranau"})
RETURN ref2.title AS Title, author.name AS Author,
-[:WROTE]->(hbase:Refcard {title:"Apache HBase"})
count(*) as Score, collect(DISTINCT t.name) as Topics
-[:FOR_SKILL]->(:Skill {level:5}),
ORDER BY Score DESC
(hbase)<-[:TAGGED]-(:Topic {name:"Infrastructure"}),
(hbase)<-[:TAGGED]-(nosql)
Querying Graphs
Michael Hunger 1 [NoSQL]
with Neo4j
Oliver
Gierke
Framework
WR
D
GE
E
FO G
TA
R_S
KIL
L database suggests you read the "Querying Graphs with
Core
Spring
Data
Neo4j" Refcard if you haven’t done so already, because its
skill level is similar to that of the Core Spring Data Refcard
and it shares a topic: NoSQL.
TAGGED
GraphDB
Infrastructure
Explore this example further in one of our live graph gist
ED
presentations.
G
NoSQL
G
TA
TA
GGE TAG
D GE
D GE D
TA
G Querying
Graphs WROTE Michael
Apache with Hunger
Alex WROTE HBase Neo4j
Baranau
FO
ILL
R_
SK
SK
IL
R_
L
FO
1
5
JOIN NOW
DZone, Inc.
150 Preston Executive Dr.
Suite 201
Cary, NC 27513
DZone communities deliver over 6 million pages each month to more than 3.3 million software 888.678.0399
developers, architects and decision makers. DZone offers something for everyone, including 919.678.0300
news, tutorials, cheat sheets, research guides, feature articles, source code and more.
Refcardz Feedback Welcome
"DZone is a developer's dream," says PC Magazine. refcardz@dzone.com
Sponsorship Opportunities
|
Copyright © 2014 DZone, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in
any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission © ofDZone,
the publisher.Inc. dzone.com
sales@dzone.com
Version 1.0 $7.95