Social Network Analysis and Object Attribution With Maltego 3

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

HOME ABOUT GITHUB TWITTER

16 FEBRUARY 2015 / MALTEGO

Social Network Analysis


and Object Attribution
with Maltego 3

I wrote this article with a fellow student of mine,


Petter.

Social Network Analysis (SNA) is getting more and more


widespread now that online social networks like LinkedIn and
Facebook are getting more popular12. This kind of analysis has
traditionally been a manual task where investigators are
connecting individuals and events more or less by hand.
However, the wast amount of data available calls for tools that
automate detection, recognizion and visualization of these
connections on a large scale. Maltego 3.0 is such a tool. In this
paper we will review the different aspects and applications of
SNA and seek to discover how Maltego may help automating the
information gathering and structuring. Before we can review
how Maltego can help us with SNA, we need to understand what
problems it aims to solve. The rest of the article is therefore
structured as follows. In the next section, related work to this
article will be reviewed. Section II will review some of the
foundations for SNA as well as how such analysis is related to
digital forensics. Section III will provide an introduction to what
Maltego 3.0 is and why it exists. Section IV will look at the
different approaches to SNA and how Maltego can be used in
this regard. Finally, conclusions and proposals for further work
will be given in section V.

In this work we are using the community edition of Maltego 3.0.


This is a free version of the application and has a couple of
limitations. The limitation that has affected our work is that we
were only allowed to generate 12 vertices per transformation,
and thereby making the size of our examples smaller than what
we initially intended.

A. Related work
There has only been publish a few scientific articles specifically
about Maltego. Danny Bradbury has recently written a couple of
articles about SNA where Maltego has been used to gather and
structure the data1112. Though the limited efforts on research
concerning Maltego, there seems to be solid scientific research
about SNA in general. The book by Wasserman and Faust from
1994 on Social Network Analysis is probably the most cited
publication on the topic to date1. Much of the different
techniques reviewed throughout this paper is based on their
work. The work of Chen 9, Xu6 and Fard and Ester1 has all
provided good foundations for the applications and limitations
of SNA in the investigation of criminal groups.
Background

A. Digital evidence
The term digital evidence has by Carrier and Spafford been
defined as any digital data that contain reliable information that
supports or refutes a hypothesis about the incident. 7They
define electronic evidence as probative information stored or
transmitted in digital form. The notion also includes some key
principal elements which boils down to accuracy, reliability and
integrity of the evidence.

How these types of evidence is handled, it is often referred to as


the chain of custody. A quite important mechanism in this
regard is the use of checksums, or digital fingerprints, meaning
the integrity of the data. Other important data in this regard are
timestamps and the forensics experts own documentation.

Evidence integrity includes digital fingerprints and Order of


Volatility (OoV). When considering integrity preservation is the
most important aspect, or intentionally not preserve part of, the
evidence from the form it had before it was aquired. The notion
forensic soundness is often referred to, meaning no alteration of
source data, every bit is copied and no data is added to images.
When it comes to collecting data in open sources, this is a
challenge, since it is difficult to prove what form the data had
when the object or subject in question interacted with the data,
and who interacted last. The latter brings several other elements
to play as well:

Can we trust the host where the data is collected from?

Does the information correlate to other data?

The principals in this section is central to the way Maltego can


be used for collecting digital evidence for use in court cases.

Tools that collect and process open source data are commonly
called crawlers. There are several ways of constructing such
crawlers for efficiency and accuracy, e.g. the one found in Fard
and Ester1.

Even though external scripts does not directly affect the


evaluation of Maltego, most of the foundation of how a open
source framework would be implemented is based on crawlers (a
set of scripts crawling the web and generating the graph objects
used by Maltego). This has a lot of impact when considering the
integrity and reliability, accuracy and integrity of Maltego itself
as argued in the previous section.

B. Graph Theory
Graph theory has shown to be an effective way of abstraction in
large datasets. Problems, such as the travelling salesman
problem (TSP) NP-hard problem, may be presented in means of
graph theory.

Graphs, which was first documented in 1736, is mathematical


way of representing sets of objects. These objects, named
vertices, are connected by symmetric or asymmetric edges. The
order of a vertex, are decided by the number of connected edges
(e.g., a order of two means that there are two edges connected to
the vertex). While graphs is the fundamental concept in
networks, there are several other important notions as well.

Random Graphs are generated by random processes. In addition


to graphs as mentioned above, a portion of probabilty theory is
applied as well. Random graphs are typically seen in nature and
in inpredictable human behaviour.
In later years graph theory has gained more attention in regard
to modelling human social behaviour, both online in social
networks such as Facebook and LinkedIn as well as in physical
interaction1.

Graphs are essential to the way social networks are presented


graphically in Maltego.

1. Multigraphs: When more than one edge are connected


between two vertices it is defined as a multigraph. In
social networks there may be several reasons for subjects
having redundant connections. One reason is mentioned
in3 is in social networks. In an online social network such
as facebook, users are connected and their connections
are often shown in a friend list. This means that if the
user himself is not vulnerable to enumeration through
e.g. an open source search his connections may be. In
definition there are several types of connections as shown
in figure 2.

The four multigraph types are complementary to each other and


show how subjects is connected. In other words: A criminal case
with n number of subjects may be connected either through
friendship, groups or events, e.g. if there was a cyber attack. The
multigraph in figure 2 shows a practical example of how subjects
are connected in such a fictional event.
Multigraph patterns such as the ones in figure 1 will almost
certainly always be seen in Maltego generated graphs.

C. Network Analysis
Xu and Chen6 has defined three generations of network
analysis:

First generation network analysis represents an analysis where


an investigator gathers data about criminals in a matrix, ending
at the drawing of a link chart. While first generation tools
requires a totally manual approach second generation network
analysis tools visualizes the link chart automatically. As will be
formalized later on Maltego uses graph theory as a foundation
for these visualizations.

Second generation network analysis has probably added a lot of


value in terms effective investigations. According to Klerks13 the
Analysts Notebook, which is one link analysis application, was
widely accepted in dutch law enforcement as early as 1999.

Back in 2005 when Xu and Chen published their report, they


stated that there were no existing third generation tool. Even
though the techniques for generating visualizations has been
enhanced and so has the graphical interfaces, there are no
significant changes in the way the data is presented to the
investigator. If we split network analysis into the three phases:
1) Generation, 2) pattern detection and recognition and 3)
visualization, there are no known automated methods of
recognizing and detecting patterns. Thus the job of detecting
and recognizing patterns remain manual.
In future editions of tools, such as Maltego, automated SNA will
help investigators solve and prove crimes, it probably already
does. Automated pattern analysis will probably give the ability
to detect the characteristics of criminal networks, again
highlighting important structural elements of a given graph and
connections.

There are differences in terms of social network analysis in open


sources and traditional network analysis. While examples often
refer to telephony logs, incident reports, bank transactions and
so on (typically who called who), social networks in open sources
often refer to the highly complex social structures such as
LinkedIn and Facebook.
D. Social Networks
Social networks, which stem from sociology was first
documented in the late 1800s. Sociology is the study of society,
describing how people are related to each other. Individuals are
typically tied through persistent connections or larger social
groups. Typically such a group is connected through common
properties.
Social groups are typically social network structures are grouped
into chained, star or complete topologies as shown in figure 1. In
criminal investigations such social networks has gained more
awareness in later years due to typically faced problems such as
finding the ties between persons of interest. Fard and Ester
studied such a problem1, and how to identify suspects based on
the suspect ties, and concluded that it can be automated through
a P2P application and an unified medium.

INTRODUCTION TO MALTEGO
3.0
When forensics experts collect data from open sources, possibly
the foremost task is to document how the data was acquired and
to structure it. The latter part is challenging in terms of data
quantities. Paterva is a South African company behind the open
source intelligence and forensics application Maltego4. By
providing a Graphical User Interface (GUI) for displaying data
in several ways, such as with clustering by object attributions
and the centrality view which will be handled later on. In short
Maltego help the forensics expert to structure data. Since
Maltego is more of a framework with GUI capabilities, advanced
usage is based on plugins, either own ones written in some
programming language (e.g. Java or Python). Additionally
Maltego comes preloaded with some web-based plugins that
uses Patervas servers. In Maltego a plugin is named a transform.
1. Integrity: Maltego does as mentioned consist of a GUI
and an input interface. The input interface is quite
”dumb” accepting eXtensible Markup Language (XML)
objects.
Thus, Maltego itself must be said to be juridically solid based on
its simple architecture The operation against the

transform may be introduce a factor of uncertainty though.


There are questions like who created the transform, what are
their intentions and how is it implemented. In some commercial
cases, such as the popular SocialNet plugin5 the transforms are
not open source even though Maltego is.

To avoid untrusted transforms, it is easy enough to create


custom ones. The script used for creating figure 6 for instance, is
a LinkedIn scraper and parser producing objects like education,
location and so on. This works by e.g. creating a Python
transform which outputs desired XML which Maltego converts
to objects, but there are downsides to this approach as well.
Even though commercial transform pools such as SocialNet are
not open source, they are well tested. Who are to say that
custom transforms does not contain errors? Additionally
creating custom transforms means that the programmer will
have to maintain it himself instead of doing investigations.

There are upsides and downsides to both using commercially


available and custom transforms. The path chosen should be
carefully considered.

2. Maltego alternatives: Before Paterva was founded in


2007 other scientists, such as Xu and Xen6 developed a
proposition for a open spurce collection utility named
CrimeNet Explorer (CE). CE, built on the principals of
hierarchical clustering, social network analysis and
Multidimensional Scaling (MDS). The previous, as one
might recall, resemble Maltego quite much. Results from
controlled experiments in the CE paper was subjects with
high precision and recall.
The Analysts Notebook (AN) which is led by i2, a part of IBM, is
also an alternative to Maltego based on Social Network Analysis.
The main concepts of operation that is implemented in the AN8:

In highly centralized networks SNA is used for finding


the subject which dominates network

Betweeness is a measure for how many paths are running


through the entities

Link betweeness is how many links runs through one


path

Closeness measures proximity of an entity to other


entities in the network

Degree. How many links (or how many edges are


connected to a vertice) are connected to a subject
Eigenvector. In addition to have many connected
subjects, an entity have weighted links representing
influence. These are combined in the eigenvector

Link direction says how information flows in the network

Link weightings is related to how well-connected the


subject is

In this section we have taken a look at two alternatives to


Maltego. As it shows, many of the same techniques are used in
both CE and AN, especially when considering graph theory.
When it comes to accuracy, reliability and integrity there is a
difference between Maltego and the others. It is in that regard
important to realize that where Maltego relies on automated
input, while the CE and AN rely on maual input. Thus, the
accuracy, reliability and integrity relies on the scripts
automating the process and the subjects inputting data. In
regard to the visualization both Maltego, CE and AN seems to
rely on the same principals: Graph theory.
MALTEGO 3.0 IN ACTION
Now that an introduction to the foundation of SNA has been
given, it is time to see how this can by utilized to extract as much
information as possible from the networks. We will in this
chapter review the following techniques for structuring the
gathered data: (1) The centrality principle, (2) Clustering, and
(2) Object attribution. In addition to these introduction, we will
for each review how Maltego can be used in order to ease the
task of such structuring and visualization. This will be done by
presenting a number of Proof Of Concept (POC)s in the form of
example figures. The data used in these POCs are not gathered
from public sources, but is of practical and ethical reasons
generated in an ad-hoc manner.

When displaying a graph i Maltego, a view type has to be chosen.


A view type can be considered a set of rules for how the verices
are organized and displayed. Maltego has four built in view
types, these are: (1) Mine View (MV), (2) Dynamic View (DV),
(3) Edge Weighted View (EWV), and finally, (4) Node List View
(NLV). MV displays the nodes in a hierarchicalmanner, where
the vertices with only outgoing edges are on the top and those
with only ingoing edges are on the bottom (see figure 13). Both
DV and EWV are organizing the vertices such that the ones with
the highest number of outgoing edges are placed at the center of
the graph. One difference between DV (e.g., figure 12) and EWV
(e.g., figure 11) is that the vertices in EWV are given a size based
on their number of outgoing and ingoing edges. This may make
it easier to detect and evaluate a vertex’s importance by looking
at its size compared to the other vertices.

NLV is a list containing all the vertices in the graph, as well as


the most important information such as type and value, for each
vertex.

A. The Centrality Principle


When doing social network analysis, we need a metric in order
to measure the importance of each vertex. Chen et al. has in
their work measured the centrality of a vertex to determine its
importance. This is referred to as the centrality principle9. They
calculate three different types of centrality: degree, betweenness
and closeness. The following definitions are based on the work
of the i2 groups work on the Analysts notebook8. The degree
centrality of a vertex is a measure of how many vertexes it is
directly connected with. The information value of a high degree
centrality is highly dependent on the rest of the network
structure. E.g., if the other notes also are directly, or indirectly
connected with each other, the value of a large degree centrality
is less than they are only connected together by the central
vertex. The betweenness centrality measures how a single vertex
connect different cliques. A clique is a subset of vertexes in a
multigraph that is only connected with each other, If a vertex is
the only vertex able to distribute communication between two or
more vertexes, it corresponds to a single point of failure in the
network. If the vertex is removed from the network, the
communication between the cliques will halt. The closeness
centrality of a vertex relates, as the name implies, to the distance
to the other vertexes. This distance can be considered a
combination of geographic distance and distance in terms of the
number of vertexes between the two vertexes.

Using these three centrality measurements, different roles in a


network can be identified. It may be possible to identify
individuals which is crucial for the network to function.
However, it is important to recognize that the findings may be
incomplete or inaccurate due to the fact that leaders may keep a
low profile in these networks9.
B. Clustering
In a court case there are often large quantities of data. Imagine
that the data have been gathered by a custom web-spider,
enumerating a subjects connections over several online social
networks. Let the number of total connections be 1000.

The process of cluster analysis is assigning each object in a set to


a group. If the objects are visualized the cluster density and
distance from one object to others symbolize the likeness of the
objects.
The different graph view-types in Maltego visualize clustering
differently, and the dynamic and edge weighted view are the
most suited for cluster analysis. Figure 7 shows an example of
the eded weighted view.

C. Object Attribution
Social networks such as Facebook, LinkedIn and Twitter carries
a lots of metadata attributed to specific users. Metadata such as
age, gender, workplaces, education and so on is formally named
attributes to an object from this point on. Some interesting
common attributes was shown in 2[p.23] . The simple analysis
of the three social networks showed that typical common
attributes are surname, lastname and a profile picture.

Briefly explained the goal of attribution in Maltego is to


associate different types of data types to a common root cause
based on a combination of available evidence. The root cause is
best presented as individuals, groups or communities associated
with e.g., a crime.

What has been seen of Maltego so far leads to an interesting


problem: May data be displayed in misleading ways, e.g., in
court?

A common problem that arise when working with graphs and


data being fundamentally different, or being of different types, is
how to combine them without still finding relevant or not
erronously creating connections. This is quite important since
the reason for using Maltego is to find these connections and
structure large quantities of data. A more advanced example can
be taken from attack attribution where there are many sources
of information to be taken into consideration for revealing the
relevant patterns.

A solution to combining different datasets has been proposed by


O. Thonnard10 . The solution was based on clustering and
further data aggregation based on Multi-criteria Decision
Analysis (MCDA). The results was data which was possible to
analyse from multiple viewpoints making it possible to find
patterns of interest. The thesis also shows how to combine the
viewpoints, so-called data fusion. To make the applicability
clearer a set of profile vectors from Facebook (enumerated user
vectors created from e.g., a user and enumerated by the
centrality principle) may be combined with the data log of a
criminal profiling system. These two systems generates different
types of data which may be converted to vectors, again being
processed by MCDA and clustering.

In Maltego, attribution is implemented by using both pre-


defined and dynamic attributes. When a transformation are
generating its entities it may add as many additional value types
as needed. Different entities of the same type contain different
sets of values. The predefined properties has the advantage over
the dynamic ones that they support arrays of values. This may
come in handy, for example, if there are multiple direct
connections between to entities. While doing analysis on
entities, these attributes may provide additional clues to where
to look for more information.

There is a strong belief that object attribution also has a large


commercial potential when it comes to tailoring services for
customers needs. The online service RapLeaf6 claims to be able
to identify customer attributes such as age, relationship status,
hobbies, etc. This is done by gathering information from open
sources.

D. Fusing network analysis and object


attribution
When preparing for SNA with Maltego, the investigators has to
determine what should be defined as entities, and what may be
left as attributes. The entities that are being analyzed in Maltego
may have a wast amount of attributes. These attributes may be
divided into three groups. (1) Some of these attributes are
unique to the entity, e.g., email address and phone number. (2)
Other attributes are shared among indirectly connected entities,
such as the participation in an event and a location on a given
time. (3) The third type of attributes are the ones that may be
shared by several entities, but having equal values of these
attributes doesn’t infer any indirect connection between the
entities.

The analyst should before starting the analysis determine which


attributes to set as entities, and which attributes to set as
additional attributes on other entities. Even though it is possible
to run transforms from an entity with several attributes,
connecting to other entities through the additional attributes
should be considered a bad idea. Imagine having a Person entity
with the primary value Name the additional attributes
EmailAddress and Phonenumber. Lets call this person A. When
running a transformation a transformation, a connection to
Person B is made. This connection indicates that A has been in
contact with B, but as figure 13 reveals, we are not able to tell
wether the contact was made by email or by phone.

By letting both EmailAddress and Phonenumber be their own


individual entities, as seen in figure 14, we are able to tell if the
contact between A and B was through email, phone, or both. As
well as giving an example on how additional entities may
provide more information, figure 13 and figure 14 also
demonstrates some of the different views that are available in
Maltego.
DISCUSSION AND
CONCLUSIONS
In this article we have presented both the mathematical an
historical background of Social Network Analysis. After
describing what SNA is and why we need tools in order to do i
efficiently, we reviewed the commercial tool Maltego 3.0. In
regard to using Maltego 3.0 for gathering data that may be used
as evidence in a court of law. We discussed how the reliability
and integrity of Maltego largely depends on the tools used to
gather and structure the information. While reviewing Maltego
we had a special emphasis on what types of information that
could be possible extract from the social networks. We have
reviewed how these types of information can be organized in the
different views in Maltego, and explained some of the
differences between these views. We have discussed how
Maltego can be used to perform object attribution in order to
discover more about each entity, and how this combined with
SNA may give extensive information on an entity and how itis
connected with its surroundings. Finally, we discussed how
defining attributes as a set of entities for another entity may give
more accurate information on how entities are connected.

It seems obvious that with the rapidly increasing amount of data


publicly available on the internet, the value of SNA will continue
to grow. The automation in 2. generation network analysis tools
has expanded the limits for how much data that can be analyzed,
and therefore utilize more of the available data. However, there
seems to be a long road ahead before actual automation of SNA
will become a reality. This especially applies to the reliability of
indirect connection between entities, i.e., when there are events
or objects that connects two or more people. It should also be
considered a limitation that todays tools and techniques does
not allow analysts to analyze the evolvement of the social
network over time. On smaller social networks automated tools
such as Maltego 3 may however provide an efficient, reliable and
accurate visualization. Automated tools may enable analysts to
faster understand the properties of a network and also uncover
patterns and connections which the analysis didn’t initially
looked for or knew existed.
In order to be able to use the information acquired from Maltego
3 in court, there is a need to be able to prove the information’s
reliability and integrity. As discussed earlier in this paper these
attributes rely heavily on the tools used for gathering the data.
There is a dilemma between using closed source, but widely
trusted tools, versus using in-house, known source and less
tested tools while gathering data. This will probably remain an
issue, but a combination of the two is likely to fit most needs.

A. Further work
As there are some fundamental differences in how data is
entered into tools such as Analysts Notebook and tools such as
Maltego 3, it should be interesting to look at how this affects the
integrity and reliability of the evidence they provide. This could
be done by defining how uncertainty in general can be measured
in the data gathering tools.

There is also a need for more experimental research on the value


of data gathering in open sources. This is specially relevant for
the accuracy and correctness of the data that are gathered from
sources such as Facebook and LinkedIn.

Finally, it could be interesting to look at how social networks


evolve over time, i.e., how events affects the infrastructure in the
social networks. This could be done by gathering data over a
longer period of time and assign a time to the data that is
gathered.
1. Wasserman, S., and Faust, K., Social Network
Analysis:methods and applications, Cambridge
University Press, 1994

2. Amin Milani Fard and Martin Ester, Collaborative


Mining in Multiple Social Networks Data for Criminal
Group Discovery School of Computing Science, Simon
Fraser University, BC, Canada, 2009.

3. E. Petrido and M. Kuczynsk, Synergy of social networks


defeats online privacy University of Amsterdam System
and Network Engineering, 2011

4. M. Gjoka, C.T. Butt, Maciej Kuran and Athina


Markopoulo, Multigraph Sampling of Online Social
Networks UC Irvine

5. Roelof Temmingh and Andew MacPherson, Maltego


Open Source Intelligence and Forensics Application
website

6. RapLeaf, RapLeaf website

7. Daniel Clemens, SocialNet website: Maltego common on-


line social media oulets search, automated searching for
actors identities.

8. J.J. Xu and H. Chen, CrimeNet Explorer: A Framework


for Criminal Network Knowledge Discovery, ACM
Transactions on Information Systems, 23(2), pp. 201-
226. April 2005

9. B. Carrier and E. Spafford, An Event-Based Digital


Foren- sic Investigation Framework, Center for
Education and Research in Information Assurance and
Security - CE- RIAS, Purdue University 2004
10. i2, Analyst’s Notebook 8 - Social Network Analysis. June
2010

11. H. Chen, W. Chung, J. J. Xu, G. Wang, Y. Qin, M. Chau,


Crime Data Mining: A General Framework and Some
Examples, Computer, vol. 37, no. 4, 2004, pp. 50- 56

12. O. Thonnard, A multi-criteria clustering approach to


support attack attribution in cyberspace - Ecole
Nationale Suprieure des Tlcommunication. March 2010

13. D. Bradbury, In plain view: open source intelligence.


Computer Fraud & Security Volume 2011, Issue 4, April
2011, Pages 5-9

14. D. Bradbury, Data mining with LinkedIn. Computer


Fraud & Security Volume 2011, Issue 10, October 2011,
Pages 5-8

15. P. Klerk, The Network Paradigm Applied to Criminal Or-


ganisations: Theoretical nitpicking or a relevant doctrine
for investigators? Recent developments in the
Netherlands Dutch National Police Academy, Apeldoor,
1999

Tommy
Tommy (B.Tech., M.Sc.) is a seasoned cyber security analyst with
Read More
experience from both the government and private industry. He
works daily with data- and intelligence-driven cyber security
operations.

PYTHON

A Graph Experiment with


Threats and Incidents
I currently maintain this threat
database, and up until now I've
generated the graph data for d3 using
queries, and a lot of logic, in a
MySQL-database. That is going to
CLUSTERING
change pretty
A Novel Way of Detecting
Malicious PDF Documents
For some time now the Portable
Document Format standard has been
a considerable risk in regard to
corporate as well as private
information security concerns. Some
work has been done to classify PDF

23 MIN READ 2 MIN READ

The Security Diary © 2018


Latest Posts Twitter Ghost

You might also like