Professional Documents
Culture Documents
Visual Graph
Visual Graph
What is Elasticsearch? ......................................................................................................................... 2
Try it yourself!.................................................................................................................................... 11
• Project managers and non-technical staff looking for a detailed introduction to visualizing
data from Elasticsearch with KeyLines.
• Developers and technical staff seeking a non- technical introduction to visualizing data
from Elasticsearch with KeyLines.
1
What is Elasticsearch?
Elasticsearch is a fast and scalable open source search engine.
Its power and out-of-the-box simplicity has made it a popular option for organizations needing a
way to search very large volumes of data. It can support near real time searching of data on a
petabyte scale, using a system of sharding and routing to scale outwards from the beginning.
The Elasticsearch engine itself is built on the Apache Lucene software library. Lucene is a high-
performance technology for searching and indexing data, but it is also very complex.
Elasticsearch makes the power of Lucene more readily useable by pre-selecting some sensible
defaults and providing a more intuitive REST API.
Elasticsearch powers the search functionality of some very data-rich organizations, including
Facebook, Wikimedia and Stack Exchange. It is also increasingly popular with KeyLines
developers, looking for a powerful and scalable back-end technology for their graph applications.
In this Getting Started guide we are going to explain how you can use the KeyLines toolkit to
build a UI for your Elasticsearch server.
Through the document, we will refer to a number of different technologies in the Elastic Stack:
• Logstash – a tool for streaming, munging and loading data into Elasticsearch
Out of the box, Elastic Graph uses relevance scoring to help identify the most meaningful
connections. This simple analysis can be enhanced with KeyLines visual graph analysis
functionality, making it easier for users to understand complex network trends and uncover
outliers.
2
A screengrab of a Kibana dashboard, via http://elastic.co
Kibana includes a Graph plugin, allowing users to visually explore data connections:
As both Kibana and KeyLines are web technologies, they complement each other perfectly.
3
Logstash – a data management tool
The easiest way to load data into Elasticsearch is using LogStash, a command line tool. This
approach means you can input data as a CSV file, leaving LogStash to parse the dataset into your
Elasticsearch instance.
In this Getting Started guide we are going to follow the steps required to build a simple KeyLines
component to visualize and explore your Elasticsearch graph data.
4
A KeyLines / Elasticsearch Architecture
Elasticsearch provides a REST API and works with the JSON data structure, so the KeyLines
integration architecture is very simple:
In this scenario users interact with KeyLines, which runs in the web browser, to raise events (e.g.
click, hover, right-click, etc). These user interactions with the graph interface raise requests to the
Elasticsearch REST API. Elasticsearch returns the data as a JSON object, which is then styled and
re-presented in KeyLines.
5
Getting started with KeyLines and Elasticsearch
In this tutorial, we will create a simple KeyLines application to perform a search of our
Elasticsearch data. This is just the starting point. Once you have a functioning integration, you can
incorporate additional KeyLines visualization and analysis functionality.
We used a random data generator to produce a fake dataset of users. Then we imported the
generated users into Elasticsearch with Logstash: with a “user” type inside a “users” index.
6
gender:
"string",
company:
"string",
eyes_color:
"string"
}
To give you some idea of how this works, here is some of the HTML we would need on our page
to load the KeyLines component:
<!-‐-‐
This
is
the
HTML
element
that
will
be
used
to
render
the
KeyLines
component
-‐-‐>
<div
id="kl"></div>
After that, the rest will be UI to interact with Elasticsearch.
7
By importing the data with Logstash, we have an extra field in each user: message. It is the raw
line used to do the import, it looks like this:
For a graph search for the term ‘brown’, our data query would look like this:
{
"query":
{
"query_string":
{
"default_field":
"_all",
"query":
"brown"
}
},
"controls":
{
"use_significance":
true,
"sample_size":
2000,
"timeout":
5000
},
"connections":
{
"vertices":
[
{
"field":
"message",
"size":
20,
"min_doc_count":
3
}
]
},
"vertices":
[
{
"field":
"message",
"size":
20,
"min_doc_count":
3
}
]
}
In response to this we would receive a JSON object, which we can parse into KeyLines’ own JSON
format.
8
Step 6: Parse our result in the KeyLines format
The Elasticsearch response contains all the information we need to create a KeyLines input, so
parsing your JSON is a relatively simple process.
{
connections:[],
failures:[],
timed_out:false,
took:0,
vertices:[]
}
Inside the connections attribute, we will find the links, for example:
{
doc_count:
14,
source:
10,
target:
2,
weight:
0.005304290380952548
}
source and target attributes are the index of vertices in the vertices attributes.
Inside the vertices attribute, we will find the object itself, for example:
{
depth:
0,
field:
"message",
term:
"blue",
weight:
0.8421388547845717
}
For this we just use the makeNode() and makeLink() functions to get our KeyLines input, e.g.:
9
id:
item.term,
type:
"node",
t:
item.term,
e:
e,
c:
"green",
d:
Object.assign({},
item)
};
};
var
makeLink
=
function
(index,
item,
nodes)
{
var
w
=
getLinkWidth(item);
var
node1
=
nodes[item.source];
var
node2
=
nodes[item.target];
return
{
type:
"link",
id:
"link_"
+
node1.term
+
"_"
+
node2.term,
id1:
node1.term,
id2:
node2.term,
w:
w,
d:
Object.assign({},
item)
};
};
function
loadChart(items)
{
chart.load({
type:
'LinkChart',
items:
items
},
function
()
{
chart.layout("standard");
});
}
Success!
10
In this example, we have added another request to count users returned in our search result. This
allows us to scale nodes and weight the links.
For example, you may want to pull in nodes with their full relationships. This would be managed
by performing another server request asking for all elements in the relationships found. You will
also ask it to omit any related nodes – otherwise you will keep returning the original node over
and over.
You should now be ready to extend these with other functionality to help users explore and
understand their data. The KeyLines SDK site contains has a fully-documented API of functionality
for you to incorporate.
Try it yourself!
To find out more about KeyLines, or to start a free trial, just get in touch: http://cambridge-
intelligence.com/contact.
11