Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

Social Network Analysis (SNA)

Node grouping

Grau de Ciència de Dades | Escola Tècnica Superior d’Informàtica | Universitat Politècnica de València
Sources


Albert László Barabási: Network Science. Cambridge
University Press, 2016
– Follows almost section-by-section chapter 02

● Newman, Mark E. J.: Networks: an introduction. Oxford


University Press, 2010
– Chapter 7 - Measures and metrics

2/43
Contents
1. Connectedness
2. Clustering coefficient
3. Central sets
4. Bipartite graphs
5. Assortativity
6. Other structures and measures
7. A case study

Social Network Analysis (SNA): Node grouping


Connectedness

Social Network Analysis (SNA): Node grouping


CONNECTIVITY OF UNDIRECTED GRAPHS

Connected (undirected) graph: any two vertices can be joined by a path

B
B
A
A Largest Component:
Giant Component
C
D E C
D E
F
G
F The rest: Isolates
G

A disconnected graph is made up by two or more connected components

Bridge: edge such as if we erase it, the graph becomes disconnected


CONNECTIVITY OF UNDIRECTED GRAPHS Adjacency Matrix

The adjacency matrix of a network with several components can be


written in a block-diagonal form, so that nonzero elements are confined
to squares, with all other elements being zero:
CONNECTIVITY OF DIRECTED GRAPHS

Strongly connected (directed) graph: it has a path from each node to


every other node and vice-versa (i.e. AB path and BA path).
Weakly connected (directed) graph: it is connected if we disregard the
edge directions.

Strongly connected components (SCC) can be identified (e.g blue):


B
E
A F
B

D E C
D C G
F
G

In-component: nodes that can reach the SCC; e.g. E, G (right)


Out-component: nodes that can be reached from the SCC; e.g. D, F (right)
Section 2.9
Clustering coefficient

Social Network Analysis (SNA): Node grouping


CLUSTERING COEFFICIENT

(Local) clustering coefficient (Ci)

What fraction of your neighbors are connected among them?


Example: node i with degree ki
Ci in [0,1]
CLUSTERING COEFFICIENT


Clustering coefficient Ci is a property of a node i

Let Li represent the number of links among neighbors
of node i

Average clustering coefficient (<C>)



The average clustering coefficient <C> is a property of
a graph
Exercice
(Programming notebook-2)

Calculate the clustering coefficient of node 1


and the average coefficient of the graph
Exercice
Calculate the clustering coefficient of each node
and the average coefficient of the graph
CLUSTERING COEFFICIENT

Tendency to form triangles


Many natural processes of link formation encourage the
closing of “V”s into triangles

Example 1: you’re more likely to meet new friends
through common friends

Example 2: you’re more likely to follow an account u
if you see content posted by u and re-posted by an
account v that you already follow

This fact will promote the existence of other measures
related to clusters
Central sets

Social Network Analysis (SNA): Node grouping


CENTRAL SETS

• Network-level centrality (as opposed to node centrality)


• Subsets of “important nodes”
• Interconnected central nodes

Q: What are the (related)


central fields of science?

Nodes: fields of science


Edges: fields are similar

A. Calderone, "A Wikipedia Based Map of Science." Figshare (2020), https://doi.org/10.6084/m9.figshare.11638932.v5


K-CORE

Maximal subnetwork where each node has


degree at least k

Main core or simply the core:

K’-core such that there is not


k-core with k > k’

M.E.J. Newman. (2010). Networks: An Introduction. Oxford University Press.


K-CORE

Cores used to find structural patterns in healthy


cells lost in cancer

“Nodes with high inter-


connectedness as opposed to high
connectedness are conserved in
the healthy Gene co-Expression
Network”
K-CORE

Large core in Procurement markets lead to


corruption risk

“We study the structure of these


networks in each member state,
identify their cores, and find that
highly centralized markets tend to
have higher corruption risk”
K-CORE

Back to the science map… Core nodes Other nodes


K-CORE

The core of science are


computational/mathematical fields!
K-CORE

Words of caution

• No formal reason to suppose that k-cores are linked


with node roles or behaviors
• Strongly inter-connected not necessarily mean
central (important) to the overall network
Bipartite graphs

Social Network Analysis (SNA): Node grouping


BIPARTITE GRAPHS

Bipartite graph (or bigraph)


A graph G = (V,E) whose nodes E can be divided into two disjoint sets VL and VR
such that every link connects a node in VL to one in VR, and VL and VR are
independent sets (nodes within an independent set are not linked between them).

VL VR

Examples:

Hollywood actor network


Collaboration networks
Disease network (diseasome)
Ingredient-Flavor Bipartite Network

Y.-Y. Ahn, S. E. Ahnert, J. P. Bagrow, A.-L. Barabási Flavor network and the principles
of food pairing , Scientific Reports 196, (2011).
TRIPARTITE NETWORK

29/43
Assortativity
Newman, Mark E. J.; Networks: an introduction; Oxford University Press (2010)

Social Network Analysis (SNA): Node grouping


ASSORTATIVE MIXING (or HOMOPHILY)

Tendency of the nodes to


connect to other nodes that
are like them in some way

“Some way” could mean any


node attribute. In social
networks: age, income, race,
social interests, ...

Sexual relationships are


mostly disassortive

Assortativity has substancial


effects on network structure

Image: Friendship network at a


US high school
ASSORTATIVE MIXING BY DEGREE

Particular interest Assortative (Dissasortative) mixing by degree: nodes connect


to nodes with similar (different) degree
(a) Assortative by degree: dense core of high-degree nodes and a periphery of
lower-degree nodes (b) Disassortative by degree: star-like structures
ASSORTATIVE MIXING BY DEGREE

The covariance for degrees ki and kj is:

We can normalize by the maximum value of covariance to get the


Assortativity coefficient (or correlation coefficient)

δij=1 if i=j
δij=0 otherwise

r corresponds to Pearson correlation coefficient -1>= r <= 1:


-1 perfectly disassortative network
0 uncorrelated values among the degree of a node and its neighbors
1 perfectly assortative network
ASSORTATIVE MIXING BY DEGREE

• Computation of r as expressed above is O(N)


• Sparse networks optimization O(L):

where
Basic statistics for a number of networks

Newman, Mark E. J.; Networks: an introduction; Oxford University Press (2010)


Basic statistics for a number of networks

n: Total number of vertices


m: Total number of edges
c: Mean degree
S: Fraction of vertices in the largest component S (or the largest weakly
connected component in the case of a directed network)
l: Mean geodesic distance between connected vertex pairs
α: Exponent α of the degree distribution if the distribution follows a power
law (or “–” if not; in/out-degree exponents are given for directed graphs)
C: Clustering coefficient C from Eq. (7.41)
CWS: Clustering coefficient from the alternative definition of Eq. (7.44)
r: Degree correlation (or assortativity) coefficient from Eq. (7.82)

Newman, Mark E. J.; Networks: an introduction; Oxford University Press (2010)


Basic statistics for a number of networks

• None of the values of r are of very large magnitude

 assortative mixing by degree


• Clear tendency for the social networks to have positive r


Nodes tend to group in small groups of low-degree or
high-degree

Multiedge featured
• Tecnological, information and biological networks tend to a
negative r

The number of edges that fall between high-degree nodes
is small

Single-edge featured
A case study:

Protein-protein interaction network

Social Network Analysis (SNA): Node grouping


THREE CENTRAL QUANTITIES IN NETWORK SCIENCE

A. Degree distribution: pk
B. Average path length: <d>
C. Clustering coefficient:
GENOME

protein-gene
interactions

PROTEOME

protein-protein
interactions

METABOLISM

Bio-chemical
reactions
Citrate Cycle
A CASE STUDY: PROTEIN-PROTEIN INTERACTION NETWORK

Metabolic Network Protein Interactions


A CASE STUDY: PROTEIN-PROTEIN INTERACTION NETWORK


Undirected network

N=2,018 proteins as nodes
L=2,930 binding interactions

Average degree <k>=2.90


Not connected:
185 components

the largest (giant component)
1,647 nodes
A CASE STUDY: PROTEIN-PROTEIN INTERACTION NETWORK

pk is the probability that a


node has degree k

Nk = # nodes with degree k

pk = N k / N
A CASE STUDY: PROTEIN-PROTEIN INTERACTION NETWORK

dmax=14

<d>=5.61
A CASE STUDY: PROTEIN-PROTEIN INTERACTION NETWORK

<C>=0.12

You might also like