SubGraphChallenge 2017-02-09

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Subgraph Isomorphism Graph Challenge

- draft -

Siddharth Samsi, Vijay Gadepally, Jeremy Kepner, Albert Reuther

http://GraphChallenge.org
Outline

• Introduction

• Data Sets

• Static Graph Isomorphism Challenge

• Metrics

• Summary

Graph Challenge - 2
Introduction

• Previous challenges in machine learning, High Performance Computing and visual


analytics include
– YOHO, MNIST, HPC Challenge, ImageNet, VAST
• GraphChallenge encourages community approaches, such as DARPA HIVE, to
develop new solutions for analyzing graphs derived from social media, sensor feeds,
and scientific data to enable relationships between events to be discovered as they
unfold in the field
• GraphChallenge organizers will provide specifications, data sets, data generators, and
serial implementations in various languages
• GraphChallenge participants are encouraged to apply innovative hardware, software,
and algorithm techniques to push the envelop of power efficiency, computational
efficiency, and scale
• Submissions will be in the form of full conference write ups to IEEE HPEC which will
allow participants to be evaluated on their complete solution
Graph Challenge - 3
Graph Challenge

• GraphChallenge seeks input from diverse communities to develop graph challenges


that take the best of what has been learned from groundbreaking efforts such as
GraphAnalysis, Graph500, FireHose, MiniTri, and GraphBLAS to create a new set of
challenges to move the community forward

• Initial Graph Challenges


– Static Graph Challenge: Sub-Graph Isomorphism
• This challenge seeks to identify a given sub-graph in a larger graph

– Streaming Graph Challenge: Stochastic Block Optimization


• This challenge seeks to identify optimal blocks (or clusters) in a larger graph

Graph Challenge - 4
Static versus Streaming Mode

• Static graph processing


– Given a large graph G
– Evaluate ƒ(G)
• Two classes of streaming: stateless and stateful
– Stateless: process data g as it goes by ƒ(g)
– Stateful: add data g to a larger corpus G and then process corpus ƒ(G + g)
• Graph processing is often in the stateful category
– Easy to simulate by partitioning G into streaming pieces g

Graph Challenge - 5
Labels and Filtering
• Filtering on edge/vertex labels is often used when available to reduce the search
space
– Filtering can be applied at initialization, during intermediate steps, or at the vertex level
– Very problem dependent

• Some Graph Challenge data sets have labels and some data sets are unlabeled
– Some participants will want to filter on labels

• Initial example implementations will work without labels


– Labels can be used if they choose

• GraphChallenge is judged by a panel


– Panel will value the variations that are submitted

Graph Challenge - 6
Outline

• Introduction

• Data Sets

• Static Graph Isomorphism Challenge

• Metrics

• Summary

Graph Challenge - 7
Publicly Available Datasets
Initial Datasets in Blue
Stanford Large Network Dataset Collection snap.stanford.edu/data
• Social networks (10 data sets up to 4.8M vertices & 69M edges) • Road networks (3 data sets up to 1.9M vertices & 2.8M edges)
– Online social networks, edges represent interactions between people – Nodes represent intersections and edges roads connecting the
intersections
• Networks with ground-truth (6 data sets up to 66M vertices & 1.8B edges,
e.g. Friendster) • Autonomous systems (5 data sets up to 26K vertices & 106K edges)
– Ground-truth network communities in social and information networks – Graphs of the internet
• Communication networks (3 data sets up to 2.3M vertices & 5M edges) • Signed networks (10 data sets up to 4.8M vertices & 69M edges)
– Email communication networks with edges representing communication – Networks with positive and negative edges (friend/foe, trust/distrust)
• Citation networks (3 data sets up to 3.7M vertices & 16M edges) • Location-based online social networks (2 data sets up to 198K vertices
& 950K edges)
– Nodes represent papers, edges represent citations
– Social networks with geographic check-ins
• Collaboration networks (5 data sets up to 23K vertices & 198K edges)
• Wikipedia networks, articles, and metadata (7 data sets up to 3.5M
– Nodes represent scientists, edges represent collaborations
vertices & 250M edges)
• Web graphs (4 data sets up to 875K vertices & 5.1M edges)
– Talk, editing, voting, and article data from Wikipedia
– Nodes represent webpages and edges are hyperlinks
• Twitter and Memetracker (4 data sets up to 96M vertices & 476M edges)
• Amazon networks (5 data sets up to 548K vertices & 3.4M edges)
– Memetracker phrases, links and Tweets
– Nodes represent products and edges link commonly co-purchased
products • Online communities (3 data sets up to 2.3M images)
– Data from online communities such as Reddit and Flickr
• Internet networks (9 data sets up to 62K vertices & 147K edges)
– Nodes represent computers and edges communication • Online reviews (6 data sets up to 34M product reviews)
– Data from online review systems such as BeerAdvocate and Amazon

Graph Challenge - 8
Publicly Available Datasets
Initial Datasets in Blue
AWS Public Data Sets aws.amazon.com/public-data-sets Yahoo! Webscope Datasets webscope.sandbox.yahoo.com

• Astronomy (1 data set 180 GB) • Advertising and Market Data (4 data sets up to 3.7 GB)
– Sloan Digital Sky Survey SQL MDF files – Yahoo!'s auction-based platform for selling advertising space
• Biology (4 data sets up to 200 TB) • Competition Data (3 data sets up to 1.5 GB)
– Genome sequence data
– Data challenges run by Yahoo
• Climate (3 data sets size growing daily)
• Computing Systems Data (5 data sets up to 8.8 GB)
– Satellite imagery data
• Economics (10 data sets up to 220 GB) – Computer systems log data
– Census, transaction, and transportation data • Graph and Social Data (3 data sets up to 5 GB)
• Encyclopedic (10 data sets up to 541 TB - Common Crawl Corpus) – Graph data from search, groups, and webpages
– Various online encyclopedia data
• Image Data (3 data sets up to 14 GB)
• Geographic (3 data sets up to 125 GB)
– Flickr imagery and metadata
– Street maps
• Mathematics (1 data sets 160 GB) • Language Data (29 data sets up to 166 GB - Answers browsing behavior)
– University of Florida Sparse Matrix Collection – Wide range of question/answer data sets
• Ratings and Classification Data (10 data sets up to 83 GB — 1.5 TB
Graph500.org Data Generator dataset withdrawn)
– Community preferences and data
• Used to generate world’s largest power law graphs
• Can be modeled with Kronecker Product of a Recursive Tools to generate (at various scales and parameters)
MATrix (R-MAT): G⊗k = G⊗k–1 ⊗ G
– Where “⊗”denotes the Kronecker product of two matrices data sets will be provided as part of the challenge

Graph Challenge - 9 • R-MAT: A Recursive Model for Graph Mining, Chakrabarti, Zhan & Faloutsos (2004)
• Mathematically Tractable Graph Generation and Evolution, Leskovec, Chakrabarti, Kleinberg & Faloutsos, (2005)
GraphChallenge Data Formats

8
57
78
rds

es1 ang5
ha
ric

ctiv qu
y_

ete am
_ton
nts

ph
odey
ha

ey
erc

ha adyD
SS

DrKe

davidth

bb
moh
M

DomSpoerli
PO urj ed

nA
PG

a
ID ik

L
_y
lu

LinuxMisyone ri

nto
rin
am

re
Va

DayTTa
ire

im
Sinan

ate sl
je

ow
mD RrlaciS
c

ed
dn

ak
Jh tors

D
ris
e
Ra

6C
lsrl

bem
rm

_h
he

unda ult IC nfos

ajou

ch
on

linuxnotla

z ay
ch

Cana
9

lp

an
ah
elA yK

ek

terd
m

ya
hem

luqm
nM
nx

kO
nd

r
on

a_rek rketCI neGe

og nixnd
Ju yP

es
2n
n
ks

rwso
n ge inen

veet
ri

ey
hipa
dJa
l

ha
O
Ed

an T_
oist ola

an o karisaer
ita

lnik Midma lbour


gutweiler
kk

rm gerD rsohlenmis
ig riK

Clic

psy
Pet i usiJaa

mou wor

kasimerkan rees
nD

stertpraetorianlabs

leillK
Me

u3re
so

g
eI U

RFairclo
kI

sya

G po bittaGin ndWod

ct3
_S us login
ea Aks A

Net
ntai ks

e haJHsv

esh
t t
Rerie ne es kki

Th C Gr_

in t
kt
thrertis
C

nresw8rbwpgri4
ckie vaH ez

ec

icky de
on

atC
altd LD

clear
Ti

tuA
AA phai CA
rism
gu d Pere PJ

nsI 4Vchinsi

Fauta
e g owie

re
G ns
VM

pan
an S

ughdom
cuer t ou u27 Te nfo
chmichto vikY

rFF ies7 gg
RN ne

ooha
Bfo
et Lara garin olic afta B

Info

m 107

rdD
Thai

ec
tofuS
Hop sh Cas d

vnik5287
glew

run

cFieose
rC

scottdware
arx etIO
m mK Fran

P n2te
gm

itc te av rso
icloha

mastalet

raM n
Ha
ar

D
nic

gu

hej
ud H el

ecJ joe
U kg 1

es
k__w

ile
ud ello _ linprot

sw In djdrdika

emoore Aio
lh0

eill
ku im

Sock

an s
_
ig
F ymza ta r wi

JayKyuu

uz

thecre
toaxgu
esu kle

anHa ob
josephredfern

nho
Arb

n
ota asgr bioS or

Ch1 coliccn
TheGutes
e

hyeti
he aHuhmpile

M
Gar m ala
Infola it eK rbyem ng

s in
willie x g3aT

Nftas
Ja
oherr

hing
eckm
dire
ikis rk Liv ecm rli inbimat Jasin anmegerCo heini
hm er6_6
llsttArt

s
Sureca are

ne
TACp cje teng

Cover
eri

m
pa ia D num
Hot uug th ays se onys Scec hester Ge ut no a the88 PauliAlti

tI
od
shca g se

Am
ulC k

ch Sec
pestnd GISg sal ian

Tural30
vid th sp Pa dd kjope tomihj jve
k rack Spo ee info Cynpsd an w b berze kra ilwain

ieIO
am da

ity
deerv t vo Ho pz cy junaxu mattmc atotoro

CertiVox
CurlyBe de UX lexisani io_ ox saunmobify
autys equarte d_o Bar Turm 15
Bluc
n oinB ELF
y BAES Andvgc reaerf thalmicTese AlexKara laGe Indie
msauk ko ano
ospad
nodesecu ystems
evktalo archer aggiora
giannim
ichetandhem rity _AI r00f nikmd23 hack_lu SofieHa gen ova
Apryl_P
Tiffani_B
bre bull3tp erry atseitlin
ibroadfo
minoad7 nolaforensix xandersh farahney
antkanan is RusiAlpo KPbewelldoc
_tty0
SexyLikeMeios
AcademiaObscura inboxbygmail mikk0j kev_mcdougall skriptuurus
defcoin
ustwogames anttitikkanen
willcanine SpikeBeeJo
l BitquarkBot twittersecurity NeustarCTO
martinbubertendart
romainle daavidhentunen marpichun
AbiWilks HopeFran
jerrychen k

bspector
miratim GPGTools chrisb jonasanbrow herrod
sitelutions certiv
kelly
nsporndly
TK5EPfujin_ i Reversity camicelli shittydogs Mbosinwa
ihmd mu ne ge
elizae
ox
stevewerby
Agsarr simonb
lab abhishe fallen jehova SoluM prcheachineney s
ksau

bradfitz
man and idcom kmdb jasontco
therealred wZeal
peoE i_FRCryp ayly soto phil hen mesosphe Boulevard_
dotG ngc

C
pra ponore leahk

VElet linux jimfilm


resentf
yu toV iliMoxistorM Beer

YaiAou
H
PureNe cpu PekkaN

n
ijim illa gua uipers
an en sege

hilto

So
n0 out
nalytics riaNig q hN3Weikan

RISI
landonfulle
h_ sleemarkacigeeb l htD stin erikander

IO
ar hum angeTe e

Piqua kreno
l iya ws der

oursdemer
x0

m
RulerA 75

accessjames
iveStu

No vir abn
eliGN
avemd re st viP_ri mat eichp9
Daarchuser ble ro alb lorbo son

m
h ic
0 sb
xa

samhocevar Lucio
Ha aeFe
reNZtr n1dq Se ria ma ertl04
sV ne Fig dio

ers
an

ultweet

erdo
einm evetesh tbra1c 0olng

rs Cus
ini10

osxnt
_
Z ulliv nm
loogxii nox_ CAh arsmi reasona htThe ha

mo
An onO_Po cke mhausher

i b rafa
anarchival

nLucceitanm
eC om
Explo wo JR ga Gudad

mathpunk
pNattrs s es

eC ords tek yTec


ar ym
Scoozim NSaNew Jeffr bly o wer13 rspace

Sa
tem m

orp mitS
odstar roag

DrPr hiltil IVAW

ellla_
a

s
r

auditdi
er
o

jonoevheat Gajam
WOGofFARs ztehowe lessn lehtio eyCh snd jda sb

amA ns
ysh_ga er4 G su Ancks emmangoldstein DC s Br rebrowfest bernin

zfson
lana

i
deklano

rcasticlArchip
Ubad

rgy
irib b

To9
_ua ian

yami
ksid1HA ng r2 upha

bth
iusS oRn o

esaopera
er_P

erd9
fis ot top Uto Go d wski

ba herc
ga

alk
rlie mu
a t AS nt n

raffi
tre ea
2 ifeulne

ystem
SBhi
lly

MarineTckUK
dpKhal ud je an e es Keltoedy D ke lerthbpi Erik ule arOS

pdhalili

idEne
a Rno Colis jjcga

zar
flamsm

trcoipn mille
aark

ett ca rm
lent
dyR ouln rtw en nn t_is _S

PoliceVideo
Clo qm

Spac dk s_ed
r gusuEC

sh
Kolib

open

cijournalism

m Mw
Roc

LE_spckyfe an cTionk

litt
_girlveRo

H i sununteVer
Pron

gg

pS
bla Mic un utro

ip11 To ey
cu hAatrtifa_io an okjetaoscl felip ervic

eti
An

erC flr lcdc


taBjaee

fallsinJe
Ro
ad ivlp ree

lo oP
uic A st ysSea
ra ti ele P

sleepyle UnmagicalMe

ace rreiraikam
V
iallu sktnwe LB e on

lec PerE
hae hahe
deray

tieurm
ck hqeirllee etbe er

es
ecli

wd Debod
theodoricm

hory
z

T
M
or db

eo

th no ainDeb
bct de

rech
avalo

The_SolarS
re

yawnbo

he 6ra6nd
TheMatja
z hersvm

Lilia
s ofic

Astero
d e Fo

op
rs oor anna

coracchri

cgoa nN coineaslp
alc din
fre pla
mile LxM C2D rtsra penos ap adrie

pQto 6 i_dude s y erlo


rte nichquito

at
am Netm
tions

sonmic uardng

rg
Ph

le
ni
arkila langl

__agwa
sc

rtorseregu

lle
a_ma
ageis eb otr_im

ra e c ndr RCAn
tlu i diea alex n692

etnaS

Dtoistitzere
birds
olas flo
sd nsP langcKry P nruiz ajuC

uin eb 16
p

ro
_S

ulist sin
gda Fr
skim

ul
DebConf tonic
Ap

en wsom om RCte
jheb orba UN 90

on ver
JD nd

th
tealc

4b
pp lear

th
Jam

bz
l mfr

us532 sy ks kg

ca
n
u ep H o

eb
mur
lia

ut1 a
gir le hu

EightTons
Yo

urrier

ke Swdmro s ISorror
yipo
as age tocoIL_ ra

e201
sednawk ert ieli BF

spst ks
ge lesh n3

he

AC
keska
lComunityCube cssq

jlan

al
neahe
x
ca

scnd
O
cl

lly eieeidg_
_jo
KillYourTV_I2P Fund

one
cavatica
an ure ejomcin Fdan_sie

rshall_
ow 005

edo

ira ckstca
Cha
ktro anns
CSua

fpa ey_va _mattio

McN
n

lan _N
mie

casa
o1 ne

esR

tte

oo
Dog me

LUrlieS
eyer

lecy
M in huAPISHsllvim re_u

Ibasw
bo

rdluu
ana
be

HUBB be
__ nejoepie91

4
NT

s geln
ary

T
muO
te

raisen
ItalianFD
dPCcam
jim

ni
iamjohnoliver o

tie
nkI inm Flo nb

anto
na
sM

thhn
i2porignal

n
O kurs Kedecourl aks

ibNorC

Sklcrshrmtn
o

indie
lej
vin ayy11 mna tilla

ntoon Kad

im

eirsv
die
ichjo

plan

swad
IC

sl1pe Ja dSmta

CC
au

lu

_m
sander

o_h oka nt_Thoma


rq

ug

odry
mer
ans
MorH

ior
hremWasTaken

errls

leda coDa g isla uk Goo


lly ri illn
is H DebonairFox

ele

Tom

386_ 3Mch
te

Street_Dogglevudev

eSpto
enn
un is kayg
augru

4kmith
es_

urx smau uzat s_Koll


ic od UID

al
les
Da

jim
Ke ust w

die
thle

au
mipham
ks ooks
Sp g _

avs
anked

l3 vy_Cre
g bHvied mm
TA
am

dyt

ParteiHH
nieM

att

l89 oks
Har

yaurlana

Nienor_
BB a

utar

3tnO
ff
ehes

SX
Yorgosh
iamsambee

RA
nrtG MsLods lRari

CrowdTRe
mon

ez
r ka

erg
phlast
opjk

boo_eb

oklCh Tire
diau

kayla_ebo

ss
JM

SP

supaheld

TL
rs eD

bv

books
ers

jcs om
esA

da
yG

les

Fl0range

illym
oad

ro

w
aJe
s sc13ns Franz wo eleben
va

erer
87

oks
bo
ez R ey rits

ndro799imJPion
diah

ee
i_erian

ica Numm
csdotc
ne

frod
a_enyc

2
samp
zm

de
Mauerfal
D3rB3obacht3r

ago
tin ds ijsF

nda

en 20 ric
cznweb

go

co
ganebo

wand
chaospira
m

G
dueri

Jede

elizabe
sfeed

torma
in

jcsjay

snob
m

OlafScholz

In 9e0jp e

rre
yla
r _L noF

lig
tesse
ari

ahn
tbw

Th fatim
tail
alld

sp
Tay

X
su thy

La rk ss_
m
ort

ther
esl

XX

Ne

eD uthe

on
nki o

SPIEG fibre
Ideen

atB
eD

mo
gnew

bike
ale ob

biozul

dP
erMa
to
iko Tim

de H
joshca
ky
iggs

a le
thusa

She
rd4U_
firmreadgoldtris

lo
za

MontrealSauce

nt uib
Be Be

nixs

moo

eie
tin
atr

Cyclin

Bundes_G

lla
gazin
getlychee

n
EL_
oe
ijs

th
sP

Pro ekh
traeum
euNetzros
Rits

Bro
vo

ap
Ata 234

Lo
ph

fraukeld
No

HL

fV uis
nXploit
em

p
nF

ca infil
diN

mr

ee
riF 2
matthich
a

liz
nde

ns
ov h

ale
de

e 00p
autis ssivau
i art

tra
enSek
philsweeney

pre
u ticho
ro

r75 tok
YourAno

ShoKN
Au
be

_bolte

Blit keel
ILiv
rt

e_av
zW y
eCing
tor
ya

of0fe0
tism
acmebench

in

e
usleb

billo
Uru

fm
Sho
en

ater
calsilcock

ials
Wikipedia Voting Wikipedia Adminship Road Network Twitter Data

• Public data is available in a variety of formats


– Linked list, tab separated, labeled/unlabeled
• Requires parsing and standardization
• Proposed formats
– Tab separated triples in ASCII file with labels removed
– MMIO ASCII format: math.nist.gov/MatrixMarket

Amazon cloud services will be used to host particularly large data sets

Graph Challenge - 10
The Matrix Market Exchange Formats: Initial Design, Boisvert, Pozo & Remington (1996)
Outline

• Introduction

• Data Sets

• Static Graph Isomorphism Challenge

• Metrics

• Summary

Graph Challenge - 11
Static Subgraph Isomorphism Problem

Vertex-Based Array-Based

• Given a sub-graph H and • Given sub-graph adjacency array B and


a larger graph G a larger graph adjacency array A

• Is there a 1-to-1 mapping of the vertices • A subgraph isomorphism exists iff there
in H to vertices in G such that every is a permutation array P where
edge in H is also in G ?
B = PT A P

Bold uppercase indicates a matrix, Bold lowercase indicates a vector, lowercase indicates scalars, sets, ...

Graph Challenge - 12
Scaling Observations

• Two key parameters govern the complexity of subgraph isomorphism


– Size of G and H
– Nominal complexity is |G||H|
• Large G and H is may be too challenging
• Large G or H is more practical

• Small G and G and H are of same order is less interesting


– Focus of program is large graphs

• Large G and small H (G is much larger than H) is more interesting


– Example: H is a triangle (i.e., triangle finding algorithm)
– Example: H is a k-truss (i.e., k-truss algorithm)

Graph Challenge - 13
Triangle Counting

• Triangles are the most basic non-trivial subgraph


– Set of three mutually adjacent vertices in a graph
• Number of triangles in a graph is an important metric used in applications such as
– Social network mining (e.g.: clustering coefficient calculation)
– Link classification and recommendation
– Cyber security
– Intelligence
3
– Functional biology 1
– Spam detection
• Example of triangles in a Graph 4
5 1
– Triangle 1 : Nodes 3,4,5
– Triangle 2 : Nodes 2,4,5 2 2

Graph Challenge - 14 Counting and Sampling Triangles from a Graph Stream


A. Pavan, K. Tangwongsan, S. Tirthapura & Kun-Lung Wu, VLDB 2013
Triangle Counting and Enumeration
- Example Algorithm -

• Lower complexity array approach


gle listing results for
• Triangle SNAP datasets.
enumeration has big I/O Article

Information Visualization

• Triangle counting has minimal I/O Graphing trillions of triangles


1–10
! The Author(s) 2016
Reprints and permissions:
sagepub.co.uk/journalsPermissions.nav
DOI: 10.1177/1473871616666393
ivi.sagepub.com

Paul Burkhardt

rkhardt Abstract
5
The increasing size of Big Data is often heralded but how data are transformed and represented is also pro-
foundly important to knowledge discovery, and this is exemplified in Big Graph analytics. Much attention has
been placed on the scale of the input graph but the product of a graph algorithm can be many times larger
hich every vertex is connected to every other vertex, Theorem 1. Given G P
and the Hadamard product
than the input. This is true for many graph problems, such as listing all triangles in a graph. Enabling scal-
able graph
2 s exploration for Big Graphs requires new 1 approaches 2 to algorithms, architectures, and visual analy-
d a cycle, i.e. a path with identical endpoints. P A), then D(v) =
(A (A A) and D(G) = s
tics. A brief tutorial is given to aid the argument for
2 thoughtful
v v of data in the context of graph
representation
1
analysis. 2 s algebraic method to reduce the arithmetic operations in counting and listing triangles in
Then a new
iangles in graphs are useful for complex network (A A) .
graphs ij
6 isijintroduced. Additionally, a scalable triangle listing algorithm in the MapReduce model will be pre-
sented followed by a description of the experiments with that algorithm that led to the current largest and
alysis based solely on structural, i.e. adjacency, fastest triangle listing benchmarks to date. Finally, a method for identifying triangles in new visual graph
Proof. The paths of length two, i.e. 2-paths, are iden-
exploration technologies is proposed.
ormation. Let D(v) be the local count of triangles
tified by A2 ; then element-wise multiplication A A2 s

volving vertex v and D(G) be the global count of tri- Keywords


retains
Graph, only those
scalable algorithms, fv, ugvisualendpoints
triangle counting, analytics, parallelof 2-paths
programming, that are
MapReduce
gles in G. There can be many more triangles than
also the fv, ug 2 E endpoints of edges. Then the sum
rtices and edges, and if the graph is completely con- Introduction th connected triplets in a graph2or network of entities,
Graph Challenge - 15
of elements in the v column vector
often depicted of AwhereAeachisvertex
as a triangle s the of the
cted, where G itself is a clique, the total number of Data, data, data, and more data! Obtaining data is just triangle represents an entity that is connected
1 3to the
Triangle Counting using miniTri
- Example Algorithm -

• Array based algorithm


– Given a graph adjacency array A and incidence array E
3
C = AE
nT = nnz(C)/3
4
• Multiplication is overloaded such that 5 1

2
C(i,j) = {i,x,y} iff
A(i,x) = A(i,y) = 1 and ET(x,j) = ET(y,j) = 1
0 1
0 1 0 1 ; ; ; ; ; ;
0 0 0 0 1 1 0 0 0 0 0 B C
B C B 0 1 0 0 1 0 C B ; ; ; ; ; {2, 4, 5} C
0 0 0 1 1 B C
C=AET ==BB@ C
B C
A B
A = B 0 0 0 1 1 C
C E B
T
B = B 0 0 1 1 0 0 C
C
; ; ; ; ; {3, 4, 5} C
@ 0 1 1 0 1 A @ 0 0 0 1 1 1 A ; {4, 2, 5} {4, 3, 5} ; ; ; A
1 1 1 1 0 1 1 1 0 0 1 ; ; ; {5, 3, 4} {5, 2, 4} ;

miniTri represented in compact linear algebra-based operations


Graph Challenge - 16 Kokkos/Qthreads Task-Parallel Approach to Linear Algebra Based Graph Analytics
Wolf, Edwards & Olivier, IEEE HPEC 2016
K-Truss

• A graph is a k-truss if each edge is part of at least k-2 triangles


• A generalization of a clique (a k-clique is a k-truss), ensuring a minimum level of
connectivity within the graph

• Traditional technique to find a k-truss subgraph:


– Compute the support for every edge
– Remove any edges with support less than k-2 and update the list of edges
– When all edges have support of at least k-2, we have a k-truss

Example 3-truss

Graph Challenge - 17 Graphulo: Linear algebra graph kernels for NoSQL databases
Gadepally, Bolewski, Hook, Hutchison, Miller & Kepner, IPDPS GABB 2015
K Truss: Array Formulation
- Example Algorithm -

Algorithm
• If E is the unoriented incidence array
(rows are edges and columns are R=EA
vertices) of graph G, and A is the x = find( (R == 2 ) 1 < k − 2 )
adjacency array while x
Ex = E(x,∶)
• If G is a k-truss, the following must E = E(xc,∶)
be satisfied R = E(xc,∶) A
– (E A == 2) 1 > (k – 2)
R = R − E ( ExExT − diag(ExExT) )
x = find( (R == 2 ) 1 < k − 2 )

Graph Challenge - 18 Graphulo: Linear algebra graph kernels for NoSQL databases,
Gadepally, Bolewski, Hook, Hutchison, Miller & Kepner, IPDPS GABB 2015
Outline

• Introduction

• Data Sets

• Static Graph Isomorphism Challenge

• Metrics

• Summary

Graph Challenge - 19
Metrics

• Correctness
– Number of triangles is exact for a given graph
– Enumerating all k-trusses is exact for a given graph

• Performance
– Total number of edges in graph (edges)
– Execution time (seconds)
– Rate (edges/second)
– Energy (Watts)
– Rate per energy (edges/second/Watt)

Graph Challenge - 20
Summary

• Static sub-graph isomorphism graph challenge consists of


– Triangle finding
– K-Truss

• An implementation will perform these algorithms on provided data sets and report the
given metrics

• A submission to the IEEE HPEC Graph Challenge consists of a conference style paper
describing the approach, implementation, innovations, and results

• Both hardware and software should be described in solution


– Innovative hardware solutions are of interest (in addition to best algorithm for hardware)
– Special interest in performance for very large scale data sets (1-100 trillion edges)

Graph Challenge - 21

You might also like