Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 19

Social Network

Analysis
with
Apache Spark and
Neo4J

Charles Copley
Nathan Begbie
Eli Copley
OVERVIEW

• Introduction to social network concepts


• Workshop data & data handling
• Applied visualisation and network
computations
By the end of the workshop, participants will have the basic
skills needed to learn to use Apache Spark with Neo4j for
social network analysis.
01
Introduction to
Social Networks

Introduction to Concepts & Terminology Used in Social Network Analysis


SOCIAL NETWORK ANALYSIS

Levels of Analysis

→ → → →

Individuals affect other Individual behaviours and Network properties and an Network structures,
individuals decisions determine individual’s network dynamics, evolution
network structures and location affect individual mechanisms at time 1
dynamics behaviour affect network dynamics
and structures at time 2
SOCIAL NETWORK CONCEPTS & TERMINOLOGY

Isolates

Component

Edge

Node
(degree = 4)
SOCIAL NETWORK CONCEPTS & TERMINOLOGY

Homophily
Birds of a feather
flock together

Image from Moody, J. (2004)


Sourced by Ambika
Samarthya-Howard,
Praekelt.Org
SOCIAL NETWORK CONCEPTS & TERMINOLOGY

Influence and 2 2
Selection 3 3
1 1
We influence and are influenced by
the people we are connected to; but 4 4
we also select those who are similar 5 5
to us.
SOCIAL NETWORK CONCEPTS & TERMINOLOGY

Triadic
Closure

Triad
SOCIAL NETWORK CONCEPTS & TERMINOLOGY

How connected are your friends?

Clustering Clustering Clustering


Coefficient 1/3 Coefficient 2/3 Coefficient 3/3
Page Rank
Your influence is determined by the
influence of people you are
connected to.

Your influence is passed on to PR=1.35


PR =1.35
people that you link to
Then you iterate….
MANY TIMES

PR=0.15
02
Workshop
Data

Why and how we use specific tools to handle large network datasets
DATASET

US National Longitudinal Study of Student Health


Longitudinal study of a nationally representative sample of adolescents in grades 7-12 in the
United States during the 1994-95 school year

● Includes Race, Gender and Grade.


● See: http://www.cpc.unc.edu/projects/addhealth
● Reference: A Statnet Tutorial (Goodreau, Handcock, Hunter, Butts and Morris ), Journal
of Statistical Software, February 2008, Volume 24.
https://www.jstatsoft.org/article/view/v024i09
DATA HANDLING

Raw Distributed Graph


Data Computation Database

Holds your primary data (could First import data into Spark for Then move the data into Neo4j,
also be in a database) data handling, formatting and which allows you to query
calculation function relationship patterns and conduct
SNA.
03
Data
Practical

Visualising network data and computing basic metrics


DATA PRACTICAL

A recommender system could consist of searching for people connected to


your friends, e.g. via LinkedIn

Person 1 knows Person 2 → Person 2 knows Person 3


MATCH (p1)-[r1:knows]-(p2), (p1)-[r2:knows]-(p3), (p3)-[r3:knows]-(p2) return
p1,p2,p3,r1,r2 limit 10
Thank you!

Any questions?
charles@praekelt.org
nathan@praekelt.org
eli@praekelt.org
More Reading
Social Network Analysis with Big Data
Charles Copley, Head of Data Science at Praekelt:
https://medium.com/mobileforgood/social-network-analysis-using-apache-spark-and-neo4j-1ccba3c8af9a
Homophily and Influence
Sinan Aral (2013) What would Ashton Do? Harvard Business Review On how homophily and social location impact our
choices
https://hbr.org/2013/05/what-would-ashton-do-and-does-it-matter
Weak Ties, Social Capital
Granovetter, M. S. (1977) The Strength of Weak Ties. American Journal of Sociology, 78(6), 1360-1380.

You might also like