Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

The ultimate guide

to graph
visualization
Contents
What is graph visualization? 4
Why visualize graphs? 4
Real-world uses of graph visualization 5
Graph visualization solutions 6
Graph databases 7
Graph visualization best practices 8
• Graph visualization UX 8
• Graph data modeling 10
• Visual data modeling 11
• Graph visualization UI 12
Big data graph visualization techniques 16
Our data visualization toolkits 22
What is graph visualization?
Graph visualization, sometimes called ‘link analysis’ or ‘network
visualization’, is the process of visually presenting connections (called links
or edges) between entities (called nodes or vertices) and properties.

It doesn’t matter how small the dataset is. If connections exist in the data, there’s
value in visualizing them.

Why visualize graphs?


Graph visualization enables analysts to intuitively identify trends, outliers and patterns of behavior,
helping them make the right decisions, fast. Graph visualization is so effective because it’s:

Intuitive: the node-link model instantly makes sense.

Fast: our brain is great at finding trends, patterns and outliers when data is presented in a tangible format.

Flexible: the world is densely connected. As long as there’s an interesting relationship in your data
somewhere, you’ll find value in graph visualization.

Insightful: exploring connected data interactively allows users to gain deeper knowledge, understand
context and ask more questions, compared to static visualization or just looking at raw data.

Investigating social structures and relations in complex email data with one of our toolkits

4
What are the real-world uses of graph visualization?
There are many valuable uses for making sense of complex graph
visualization across many domains and industries. Some of the most
popular include:

Security & intelligence Anti-fraud Cyber security


Distilling complex connected Detecting or investigating fraud Tracking the behavior of
data into critical intelligence and in finance, insurance or online cyber threats and analyzing
insight activity incident forensics

Law enforcement Compliance Infrastructure


Enabling detailed pattern of life Ensuring regulatory compliance Monitoring performance and
and behavioral analysis through effective data analysis faults plus root cause analysis

Customer 360 Pharmaceuticals Social networks


Understanding your customer Analyzing connections between Visualizing dynamic connections
behavior better agents, diseases, drugs & trials between social actors

You can learn more about the different use cases for graph visualization on our website.

5
Graph visualization
solutions
If you want to start visualizing graph data, you’ll want to know what options are available to you.

Graph visualization solutions fall into 3 main categories:

• an open source code library

• an off-the-shelf application

• a commercial graph visualization software development kit (SDK)

Open source code libraries offer great flexibility to build a custom application and deploy it to anyone
without additional costs. But if your developers have any technical problems, or bugs appear, they’ll have
to dig into the code to find the bugs themselves or wait for the community (which may no longer exist) to
implement fixes. They’re also typically very basic and lack the advanced functionality your users will need
to explore and understand complex graph datasets at scale.
Off-the-shelf graph visualization applications allow users to navigate and query the graph data without
any programming. They are intuitive tools for code-free investigation and insight, but they might not meet
your specific functional requirements. Most applications are standalone - you can’t embed them easily into
your products and tools, disrupting your users’ workflows.
Commercial graph visualization SDKs come with a shallower learning curve than the open source code
libraries. They provide detailed documentation and a faster developer experience. Our graph visualization
SDKs, KeyLines and ReGraph, make it easy to build and deploy high-performance graph visualization tools
quickly.
Every aspect of your application can be tailored to suit you, your data and the questions you need to
answer. They fit with any browser, device, server or database and come with clear tutorials, demos and
API documentation. You’ll also get help from our developers on technical queries, bugs and more.

Our graph visualization SDKs: KeyLines and ReGraph

6
Graph databases
What is a graph database?
A graph database stores and queries data as a network of nodes connected
by edges and featuring property information. They’re optimized for the
job of representing graphs, treating connections as first-class citizens, as
important as the data itself.

Why use a graph database?


You don’t have to use a graph database to visualize connected data. Our toolkits are database agnostic,
so they work with data from any source: graph databases, NoSQL data stores, triple stores, SQL
databases and CSV files or even just from memory. However, graph databases offer great options for
storing connected data, including:

Greater performance – compared to NoSQL stores or relational databases. Graph databases avoid
expensive ‘join’ operations and give faster access to connected data.

Lower latency – as the nodes and links ‘point’ to one another, millions of related records can be traversed
with a constant response time irrespective of database size.

Whiteboard friendliness – the graph format is most likely to resemble your real-world data, meaning you
can avoid complex data mapping and modeling exercises.

Good for semi-structured data – graph databases are schema-free, meaning patchy data, data with
exceptional attributes, or data whose structure may change, can be more readily accommodated.

Some of the popular graph databases


With data stored in a native graph format, it’s easier to map it to your visual model and create high-
performing graph visualizations. There are many graph databases available, and our toolkit technology
integrates seamlessly with them all. Learn more about them on our website.

Our toolkits integrate seamlessly with all graph databases

7
Graph visualization best
practices
Graph visualization UX
For your graph visualization application to be a success, you need to do more
than choose the right data store and front-end technology. You also need to think
about user experience (UX). There are some important things to consider when
designing your graph visualization UX.

What is UX design?
‘User experience design’ is often used interchangeably with terms such as
‘user interface design’ and ‘usability’. However, while usability and user
interface (UI) design are important aspects of UX design, they are subsets of
it – UX design covers a vast array of other areas, too.

Know your user – before you get started with any design project, you need to get the basics down first.
That means understanding your user. When you understand their problems and questions, you’ll be able
to design an effective visualization strategy.

User experience (UX) design focuses on understanding and delivering what users want. The aim is to
make them feel good about working with your graph visualization applications.

User-centered design is an iterative process where you take an understanding of the users and their context as a starting
point for all design and development.

8
The 4 cornerstones of graph visualization UX
The design decisions you make will depend entirely on your own scenario (data, users, their questions) but
there are four cornerstones you should keep coming back to.

With each decision, ask yourself: “Is this experience…”:

Intuitive – this might sound obvious, but it’s crucial. An intuitive experience is all about trust between an
application and its users. If a user trusts the graph visualization to be an accurate representation of their
data, their experience will be much more insightful.

Consistent – ensuring consistent experience and styling in the application. Don’t forget consistency
with other applications in your users’ stack too – the visualization component is unlikely to be the only
technology they use each day.

Traceable – this one comes back to trust. To trust their graph visualization, the user needs to understand
how it was generated. It might be tempting to run filtering, scoring and layouts entirely in the back-end,
but showing those processes will reassure your user that their results are accurate and trustworthy.
Animation is a great way to do this.

Reversible – don’t leave your users in fear of an accidental click trashing an afternoon’s work. Give them a
quick and easy way to undo and redo their actions.

Once you have done your research about your users and their needs, you’re ready to deep dive into the
graph visualization design and development. But don’t forget, this is an iterative process.

Rectangular combos, created with one of our toolkits

9
Graph data modeling
Data modeling is when you decide how to represent the entities, relationships
and properties in your data. It’s easy to skip over, but a well-designed data model
is the fundamental foundation of your graph visualization experience. Taking time
to get it right will help users understand their charts, and make your later design
decisions much easier.
During the graph data modeling
process you decide which entities
in your dataset should be nodes,
which should be edges and which
should be discarded. The result is
a blueprint of your data’s entities,
relationships and properties. You
can use that blueprint to create a
visualization model for your charts.

• Nodes are the fundamental units of our data. We design our entire model around these entities.

• Links are the relationships between nodes.

• Properties are descriptive characteristics of nodes and links, but aren’t important enough to become
nodes themselves. For example, a person’s date of birth.

Note
Creating a graph model isn’t always the answer. Some datasets don’t feature connections, so there
isn’t much value in applying the node-link structure to them.

You can’t design a graph data model in a hurry. Getting it right takes time, but it’s worth doing properly
with a user-centered approach that your analysts will thank you for. We’ve put together graph data
modeling best practice guides to help you.

Once you’ve chosen a winning graph data model that’s both simple and practical, you can start translating
it into your visual model.

10
Visual data modeling
The visual model determines how the data model is represented on the chart.
You’ll need to design a model that’s clear, clutter-free and helps users to instantly
recognize what they’re seeing.

Note
The data model and the visual data model are rarely the same, and that’s a good thing. Your data
model is designed to work well for your database. Your visual model should be designed for your
users, their data, and the questions they need to answer.

Let’s say you’re designing a healthcare graph visualization. Your data model might include entities like
doctors, patients, and appointments. You could model this visually as:

To validate your graph data model,


think about your users. What do
they need to know? Does this model
make it it easier for the user to
answer simple questions, like ‘how
many patients has a doctor seen?’
And ‘how many appointments has a
patient made’, etc? Not really.

So instead of modeling appointments as nodes, they could simply be links between patients as doctors
– removing a third of the nodes from the chart. All remaining columns can be added as properties, but
only if they offer useful information. Don’t add properties to your model just because they’re in your
database. You need to make decisions about what’s going to add meaning to the visualization and avoid
unnecessarily cluttering the chart.

Modeling appointments as links removes clutter from the chart

11
Graph visualization UI
At this step, we’re going to look into the most important graph visualization user
interface (UI) design decisions.

What is UI design?
UI design is an essential component of UX. It focuses on the intuitive
interactions and beautiful styling that makes your application insightful and
a joy to use.

Iconography
Icons are a versatile and stylish way to represent important details, grab the user’s attention and add real-
world context.

As with any design element, you should select icons carefully. Try to follow these guidelines:

• Stick to universally recognizable icons – ideally, your users should make the connection between the
icon and what it represents without conscious thought.

• Make sure the icons are sufficiently distinct from each other. Color can help with this.

• Avoid detail that may become illegible, and check the icons are recognizable even when you’re
zoomed out.

• Check the icons are correctly aligned. Our toolkits come with alignment options to help you with this.

Our toolkits work with any font icon set you choose.

A chart using font icons on nodes, created with one of our toolkits

12
Color
A carefully selected color palette will harness the pre-attentive processing
powers of the human brain, making insight clearer and easier to find. A badly
chosen color palette obscures the information your users need to understand, and
makes your data visualization less effective and harder to use.
Some handy color palette tips to keep in mind include:

• Choose one single attribute to represent with color – a node or link might have dozens of properties,
but limit your use of color representation to one property so you don’t overwhelm the user.

• Choose a scale – if your chosen property follows a numeric scale, choose a color or two to represent
the top and bottom ends of the scale.

• Limit the number of hues – if the property tied to color is qualitative rather than quantitative, limit
the number of groups you want to represent. Around 7 is the maximum number of colors a user can
process easily. More than 12 and your users will struggle to differentiate between them.

• Use an intuitive color scheme – if your chart shows interactions between a blue team and a red team,
those colors will automatically make sense to the user.

• Defer to the experts – there are plenty of tools out there to help you find the perfect color palette to
meet your requirements. We’re huge fans of Color Brewer.

Find out more in Choosing colors for your graph visualization

A color palette of the social network visualization created using Adobe

13
Chart interactions
The tools you design should ‘just work’ in an intuitive way, giving your users fast insight into the networks
of connected data you provide them with.

A visualization without interactions is never going to be as helpful as one you can manipulate and
play with, particularly when you’re dealing with complex connected data. Chart interactions include
clicks, drags, zooms, keyboard inputs and other gestures users make as they explore and play with
their connected data. Ensuring a consistent and intuitive experience is key to creating effective chart
interactions.

Our toolkits let you define how your users interact with the chart. You can customize what happens with
every click, gesture and keyboard input.

Accessibility
Another important UI factor that’s often overlooked is accessibility. Fully accessible digital content is
essential to provide equal access and opportunity to people with diverse abilities.

Accessibility must be a key consideration while identifying the needs of your users. It should feed into early
design phases, making accessibility a feature of, not an add-on to, the final product.

Accessibility gives you an opportunity to set standards and perform better than your competitors.

• Make a good use of keyboard controls and shortcuts – Keyboard shortcuts are especially beneficial
for people who may have a motor impairment – these users often have difficulty with the fine motor
movements required for using a mouse. Experienced users also prefer keyboard shortcuts because it
helps them perform tasks more quickly.

• Create text versions of charts – Providing a text alternative for charts means content can be
accessed by people with a vision impairment using a screen reader or refreshable Braille device, or
people with cognitive impairment using a text-to-speech system.

• Choose colors carefully – It’s important to choose colors carefully when designing your graph
visualization tools. Users with color deficiencies and visual impairment have difficulty distinguishing
between contrasting colors with similar hues, saturation and lightness.

Find out more in Graph visualization tools and accessibility

A screenshot of the Use Keyboard Shortcuts demo

14
Our products have a number of visual styles you can harness in your application,
like node sizing and styling, link weighting, labels, glyphs, font icons, and so on.
Use them wisely and separating data insight from data noise will be much faster
and simpler.

Formatting: Highlight key players by coloring or re- Glyphs, labels and annotations: Provide
sizing nodes, or show numeric proportions of data, additional information, communicate interesting
such as different centrality measures, with donuts characteristics and help differentiate between chart
around the nodes. items.

Fine-grain control: Take control of other visual Charts: Customize the entire chart to meet the
attributes, from arrow-head size, to label height, to requirements of your application – the background
link behavior when it approaches a node. color, watermark text or image, chart logo and
navigation controls can all be modified.

15
Big data graph visualization
techniques
Once you’re happy with your data model and visual design, the next challenge
you need to consider is scale. How will your users deal with their biggest graph
datasets?
Note
There’s a common misconception that chart insight increases with the number of nodes - if you
can infer something useful from just a handful of nodes, surely visualizing 100,000 will generate
even more insight? That’s rarely the case. The human brain can only process a certain amount
of information at once, so you need to design an approach that lets the user focus on the most
important parts of their graphs.

While our graph visualization toolkits can visualize huge graphs, it’s usually more helpful to give users a way to reduce the scale of
data in their charts

16
To provide your users with something more useful, think about the data funnel. Using back-end data
management and front-end interactions, the funnel reduces billions of data points into something a user
can comprehend.

The data funnel presents steps to bring big data down to a human scale

Back-end filtering
There’s no point visualizing your entire graph database instance. You want to remove as much noise as
possible, as early as possible. Filtering with database queries is an incredibly effective way to do this.

Our toolkits’ flexibility means you can give users some nice visual ways to create custom filtering queries,
like sliders, tick-boxes or selecting from a list of cases. In this example, we’re using queries to power a
‘search and expand’ interaction:

This search and expand model lets users visualize and explore huge volumes of graph data in a manageable way

17
Back-end aggregation
Once filtering techniques are in place, you should consider aggregation. There are two ways to approach
this.

• Data cleansing to remove duplicates and errors: This can be time-consuming but, again, queries are
your friend. Functions like Cypher’s ‘count’ make it really easy to aggregate nodes in the back end.

• Data modeling to remove unnecessary clutter: We’ve already covered this in the previous sections.
Find out more in Graph Data Modeling.

With a few simple aggregation decisions, it’s possible to reduce tens of thousands of nodes into a few
thousand.

Visual data modelling


Already by now, you should have reduced 1,000,000+ nodes to a few thousand. This is where the power
of visualization really shines. We can use visual modeling to simplify it further as we’ve seen in the
previous section on Visual Data Modeling.

For example, in this example we use glyphs, links sizing and donuts to represent the data stored in
collapsed links:

• glyphs to show the platforms used

• link sizing to show the volume of activity used on each platform

• donuts to show the relative use of each platform.

Using glyphs, link sizing and donuts to represent the data stored in collapsed links.

14
Filters and combos
Now your users have the relevant nodes and links in their graph visualization, you should give them the
tools to declutter and focus on the key items of interest.

Filters are great for this. For a


great user experience (and better
performance) consider presenting
users with an already-filtered view,
with the option to bring in more
data.

Another option is to use our combos


functionality to group nodes and
links, giving a clearer view of a large
dataset without actually removing
anything from the chart. It’s an
effective way to simplify complexity,
but also to offer a ‘detail on-demand’
user experience that makes graph
insight easier to find.

Social network analysis algorithms


Social network analysis is a way to understand how networks behave, and uncover the most important
nodes within them.

Powerful social network visualization algorithms cut through noisy social network data to reveal parts of
the network that deserve most attention.

A node’s centrality is a measure of its prominence or structural importance in a network. A high centrality
score could indicate power, influence, control, or status. Finding out which is the most ‘central’ node can
disseminate information in a network faster, stop epidemics, protect a network from breaking, identify
suspected terrorists and much more.

Our graph visualization toolkits include a range of centrality measures to identify the most influential
nodes in a social network. Learn more about them in our blog post on Social Network Analysis Measures.

15
Time-based analysis
Graph data is rarely static. Everything happens at a point or duration in time, and networks evolve as
connections are formed and broken. Understanding the time dimension in your massive connected data
helps you uncover insight.

Our graph visualization toolkits come with a time bar component. Exploring graphs with our time bar
component helps analysts understand how networks form, cluster, fracture and dissolve over time. Users
can explore their dynamic graph data in an interactive and intuitive way without getting overwhelmed.

What is the time bar?


It combines a histogram, showing overall graph activity, with trendlines,
which show specific node or sub-network activity. Users can navigate, filter
and playback their time-based graphs using scale and navigation controls.

The time bar, shown below the graph visualization chart, provides a summary of your graph data over time, as well as trendlines
representing individual node and sub-network activity.

If you’re interested in building interactive timeline visualizations, take a look


at KronoGraph, our timeline visualization toolkit.

14
Graph layouts
The final step is to help your users uncover insight. Automated graph layouts are great for this.

A good graph layout goes beyond simply detangling links. It should also help you see the patterns,
anomalies, and clusters that direct the user towards the answers they’re looking for. With an effective,
consistent and powerful graph layout, your users will find that answers start to jump out of the chart.

Our toolkits offer powerful graph layouts to suit every graph:

Standard layout is an efficient, force-


directed layout, meaning it uses physics
forces to ensure links are a consistent
length with nodes and links distributed
evenly, overlapping as little as possible.

Organic layout detangles complex


networks by spreading nodes and links
apart, arranging multiple components in a
circular shape, with larger components in
the center.

Sequential layout is a good way to


display data with a clear sequence of links
between distinct levels of nodes.

Hierarchy layout produces a family tree of


nodes, with child nodes positioned in layers
leading from their parents.

Structural layout is similar to the standard


layout, but nodes with similar attributes are
grouped together in fans around a central
node or cluster.

Radial layout arranges nodes in concentric


circles around the original subject in a radial
tree. Each ‘generation’ of node becomes
a new orbit extending outwards from the
original parent as the dependency chain
grows.

Lens layout pushes highly-connected


nodes into the center, and force other
nodes to the periphery to give an attractive
‘fish-eye lens’ view.

Find out more on our graph layouts page.

15
Our data visualization
toolkits
At Cambridge Intelligence, we build data visualization tools that make the world a safer place.

From law enforcement to cyber security and fraud detection, we work with organizations around the
globe. Every day, thousands of analysts rely on our software to ‘join the dots’ in data and uncover hidden
threats.

With our help, it’s quick and easy to build game-changing data visualizations and deploy them anywhere,
to anyone.

KeyLines and ReGraph are two software development kits (SDKs) we’ve built to help developers create
powerful graph visualization tools. KronoGraph, our toolkit for timeline visualization, lets you build fully-
interactive timelines that reveal how events unfold and how they’re linked.

To learn more, or to register for a free trial, visit our website.

KeyLines ReGraph KronoGraph


is a graph visualization toolkit is a graph visualization toolkit is a toolkit for building timelines
for JavaScript developers for React developers that drive investigations.

Add graph visualization to your ReGraph’s data-driven API makes With KronoGraph it’s easy to build
applications that work anywhere, it quick and easy to add graph interactive, scalable timelines to
as part of any stack. visualizations to your React explore evolving relationships and
applications. unfolding events.

cambridge-intelligence.com USA +1 (775) 842-6665 UK +44 (0)1223 362 000


Cambridge Intelligence Ltd, 6-8 Hills Road, Cambridge, CB2 1JP

You might also like