Download as pdf or txt
Download as pdf or txt
You are on page 1of 83

CII-4O3

Social Network Analysis


Lecture 1 – Introduction
Outline

• Introduction
• Definitions
• Basic Concepts
Course Learning Outcomes (CLO)

• Explain the definitions, basic concepts, and theories of social


network analysis
• Understand how these concepts and theories can help explain
different factors in network behaviors
Introduction
Social Network Analysis
Social Media: Many-to-Many

Social
Networking

Content Social Blogs


Microbloggi
Sharing
Media ng

Wiki
Forum
Social Media: Many-to-Many
Various forms of Social Media

• Blog: Wordpress, blogspot, LiveJournal


• Forum: Yahoo! Answers, Epinions
• Media Sharing: Flickr, YouTube, Scribd
• Microblogging: Twitter, FourSquare
• Social Networking: Facebook, LinkedIn, Orkut
• Social Bookmarking: Del.icio.us, Diigo
• Wikis: Wikipedia, scholarpedia, AskDrWiki
• Publishing: Blogging, Wiki

7
Characteristics of Social Media

• “Consumers” become “Producers”


• Rich User Interaction
• User-Generated Contents
• Collaborative environment
• Collective Wisdom
• Long Tail

Broadcast Media Social Media


Filter, then Publish Publish, then Filter

9
Characteristics of Social Media
• Participation
• social media encourages contributions and feedback from everyone who is interested. It blurs the
line between media and audience.
• Openness
• most social media services are open to feedback and participation. They encourage voting,
comments and the sharing of information. There are rarely any barriers to accessing and making
use of content – password-protected content is frowned on.
• Conversation
• whereas traditional media is about “broadcast” (content transmitted or distributed to an audience)
social media is better seen as a two-way conversation.
• Community
• social media allows communities to form quickly and communicate effectively. Communities share
common interests, such as a love of photography, a political issue or a favorite TV show.
• Connectedness
• Most kinds of social media thrive on their connectedness, making use of links to other sites,
resources and people.
Introduction
Why Networks?

• Rise of the Web and Social Media


• More data
• Shared vocabulary between (very different fields)
Why Networks?

• Network data is increasingly available:


• Large on-line computing applications where data can naturally be
represented as a network:
• On-line communities: Facebook
• Communication: Instand Messenger
• News and Social Media: Twitter, Blogging
• Network is a set of weakly interacting entities
• Links give added value:
• Google realized web-pages are connected
• Collective classification
Why Networks?

• How do we reason about networks


• Empirical: look at large networks and see what you find
• Mathematical models: probabilistic, graph theory
• Algorithms for analyzing graphs
• What do we hope to achieve from models of networks?
• Patterns and statistical properties of network data
• Design principles and models
• Understand why networks are organized the way they are (predict behavior
of networked systems)
Networks and Representation
Social Network: A social structure made of nodes (individuals or
organizations) and edges that connect nodes in various
relationships like friendship, kinship etc.
• Graph Representation • Matrix Representation

15
Networks: Rich data
Networks of the Real-world [1]
• Information networks:
• World Wide Web
• Citation networks
• Blog networks
• Social Networks:
• Organizational networks
• Communication networks Citation networks
• Collaboration networks
• Technological Network:
• Power grid
• Airline, road, river networks
• Telephone networks
• Internet
• Autonomous systems
Networks of the Real-world [2]

• Biological networks:
• Metabolic networks
• Food webs
• Neural networks
• Gene regulatory networks
• Language networks:
• Semantic networks
• Software networks:
• Call graphs
Sample of an online social network Protein interaction network
Sweden’s economic network of interlocked corporations
Three models of epidemic spread in human contact networks.
Ice Breaking

Tell it in a Tweet!!!
Definitions
Social Network Analysis
Definitions

• Actor/Node/Vertex
• Edge/Relation/Tie
• Path
• Network/Graph
Actor/Node/Vertex

• An individual can have relationships with other individuals.


Edge/Relation/Tie

• Describes a particular, well specified, relationship between two


Actors/Nodes/Vertex.

Undirected

Directed
Graph

§ Graph G(V, E):


§ A set V of vertices or nodes
§ Connected by a set E of edges or links
§ Elements of E are unordered pairs (u, v), where u, v ∈ #
§ Vertices V = {A, B, C, D}
§ Edges E = {(A,B), (B,C), (C,D)}
From Networks to Graphs

§ A network is a set of nodes connected by a set of edges


§ Graphs are mathematical representations of networks
• Networks are also called graphs
Vertices and edges in Networks
Path
• “In graph theory, a path in a graph is a finite or infinite sequence of edges
which connect a sequence of vertices which, by most definitions, are all
distinct from one another. In a directed graph, a directed path is again a
sequence of edges (or arcs) which connect a sequence of vertices, but with
the added restriction that the edges all be directed in the same direction.”
(Wikipedia 2015)
• Many different types of specially named paths:
• Eulerian path (crosses each edge exactly once, as in Königsberg)
• Hamiltonian path (visits each node exactly once)
Basic Concept
Social Network Analysis
Data

• Data: facts, measurements or text collected for reference or


analysis (Oxford dictionary)
• Unstructured data: data that does not fit a certain data structure (text,
images, audio, video, a list of numeric measurements)
• Structured data: data that fits a certain data structure
(table, graph/network, tree, etc.)
Data Evolution

• Sensus data (60’s)


• Transaction data (80’s)
• Micro event data (00’s)
• Social data (2010)
• 2020 ??
Moore’s law & Transistors

CTI-3A3
Applied Social Network Analysis
Big Data
Mining Social Network Data

• Mining social network has a long history in social


science
• Wayne Zachary’s PhD work (1970) observe social ties
and rivalries in a university karate club
• During his observation, conflicts led the group to split
• Split could be explained by a minimum cut in the social
network
Social Media Mining
• Social media platforms: Facebook, Twitter, LinkedIn, Reddit,
YouTube, Blogger, . . .
• Platforms generate enormous amounts of (un)structured data
• Social media mining & analytics: analyzing this data in order to
get insight in user(s), trends, usage patterns, the platform itself,
...
• Text mining
• Trend analysis
• Sentiment mining
• Topic modelling
• Social network analysis
Networks: Rich Social Data

• Traditional obstacle:
• Large-scale
• Realistic
• Completely mapped
• Now: large on-line systems leave detailed records of social activity
• On-line communities: MySpace, Facebook, LiveJournal
• Email, blogging, ecommerce, instant messaging
• On-line publications repositories, arXiv, MedLine
Networks: A Matter of Scale

• Network data spans many orders of magnitude:


• 436‐node network of email exchange over 3‐months at corporate
research lab [Adamic‐Adar, SocNets ‘03]
• 43,553‐node network of email exchange over 2 years at a large university
[Kossinets‐Watts, Science ‘06]
• 4.4‐million‐node network of declared friendships on a blogging
community [Liben‐Nowell et al., PNAS ‘05, Backstrom et at., KDD ‘06]
• 240‐million‐node network of all IM communication over a month on
Microsoft Instant Messenger [Leskovec‐Horvitz, WWW ‘08]
Networks: Scale Matters

• How does massive network data compare to small‐scale studies?


• Massive network datasets give us more and less:
• More: can observe global phenomena that are genuine, but literally
invisible at smaller scales
• Less: don’t really know what any node or link means. Easy to measure
things, hard to pose right questions
• Goal: Find the point where the lines of research converge
Networks: Structure & Process

• What have we learned about large networks?


• Structure: Many recurring patterns
• Scale‐free, small‐world, locally clustered, bow‐tie, hubs and authorities,
communities, bipartite cores, network motifs, highly optimized tolerance
• Processes and dynamics:
• Information propagation, cascades, epidemic thresholds, viral marketing,
virus propagation, diffusion of innovation
Structure of Networks
• What is the structure of a large network?
• Why and how did it became to have such structure?
Diffusion in Networks
• One of the networks is a spread of a disease, the other one is product
recommendations
• Which is which?
Social Network Analysis
Social Networks Analysis
§ “Social network analysis (SNA) is a strategy for investigating social structures through the
use of network and graph theories. It characterizes networked structures in terms of nodes
(individual actors, people, or things within the network) and the ties or edges (relationships or
interactions) that connect them. Examples of social structures commonly visualized through social
network analysis include social media networks, friendship and acquaintance networks, kinship,
disease transmission,and sexual relationships.” (Wikipedia 2015).
§ “Social Network analysis is inherently an interdisciplinary endeavor. The concept of social
network analysis developed out of propitious meeting of social theory and application with formal
mathematical, statistical, and computing methodology.” Stanley Wasserman and Katherine Fuast
1994
§ “Social network analysis is neither a theory nor methodology. Rather, it is a perspective or
paradigm. It takes as its starting point the premise that social life is created primarily and most
importantly by relations and the patterns they form.” Alexandra Marin and Barry Wellman 2014
Social Network as a Viewpoint
• Characteristics of social networks and social networks as analogy of some parts of the
society are quite common in all major social science fields (economics, sociology,
anthropology, political science, psychology).
• Social Network Analysis is a paradigmatic viewpoint of society: it contains the belief, that
social universe is formed of and can be modeled with networks.
• Not just a collection of methods, but also a strong theoretical perspective: rooted in
network and graph theory (in mathematics and in computer science) and in discrete
mathematics.
Development in Network Analysis
• Euler and Könisberg bridge –problem already in 1736. Provided the first
principles of graph theory.
• Most active developments in early and mid 1900s.
• Sociogram – a mathematical model of social group in the 1930s (Jakob.L.
Moreno)
• Social structure – based on network model in the 1940s (Alfred Radcliffe-Brown)
• Matrix calculus introduced to social networks in 1940s and 1950s
• Small world –phenomena presented and demonstrated in the 1950s and 1960s
• Dynamic networks – 1970s
• First SNA software – 1980s
Dynamic Networks
• Social networks change over time
• A dynamic network N(t) is a social network whose state changes as a
function of time t.
• Dynamic networks may exhibit different kinds of behavior:
• Evolution
• Growth
• Transformation
• Decay
• Termination
• E.g. a family as a network
Networks
• A network is a set of nodes connected by a set of edges
• Nodes are also called vertices
• Edges are also called links
• Networks are also called graphs
• A node represents an entity
• People
• Cities
• Symptoms
• Psychological construct

50
Networks
An edge represents some connection between two nodes
• Friendship / contact
• Distance
• Comorbidity
• Causality
• Interaction

51
Social Networks
• Links denote social “interactions”
• friendship, collaborations, e-mail, etc.
Information networks
• Nodes store information, links associate information
• citation networks, the web, p2p networks, etc.
Technological networks
• Man-built for the distribution of a commodity
• telephone networks, power grids, transportation networks, etc.
Biological networks
• Represent biological systems
• protein-protein interaction networks, gene regulation networks,
metabolic pathways, etc.
Ice Breaking

Coffee time J

56
Applications and Challenges in
Social Network Analysis
Applications [1]

• Web as a graph
• Google PageRank
• How to estimate webpage importance from the structure of the web-
graph?
• Routing in peer-to-peer networks:
• BitTorrent, ML-donkey, Kazaa, Gnutella
• Can we find a file in a network without a central server?
Applications [2]
• Marketing and advertising:
• How to define influence?
• How to find influencers?
• Who to give free products to
• so that we create a network effect?
• Diffusion of information and epidemics:
• How to trace information as it spreads?
• How to efficiently detect epidemics
and information outbreaks?
Applications [3]

• Friend/link prediction:
• How to predict/suggest friends in
networks?
• Trust and distrust:
• How to predict who are your
friends/foes? Who to trust?
• Community detection:
• How to find clusters and small
communities in social networks
Level of Social Network Analysis
• Nodal/Actor level
• focuses on nodal level attributes and phenomena
• Dyadic level
• focuses on the pairs of nodes
• Triadic level
• focuses on triplets of nodes
• N-adic/Subset level
• focuses on sub-graphs of N nodes
• Network/Group level
• focuses on the whole graph and network level Typically a cross-level analysis, combining all of these levels
phenomena
Challenges
• Scale
• This work considers “extreme-scale” graphs – billion+ vertices and up to
trillion+ edges
• Processing these graphs requires at least hundreds to thousands of
compute nodes or tens of thousands of cores
• Graph analytic algorithms are generally memory-bound instead of
compute-bound; in the distributed space, this results in a ratio of
communication versus computation that increases with core/node count
Complexity

• Real-world extreme-scale graphs have similar characteristics: small-world


nature with skewed degree distributions
• Small-world graphs are difficult to partition for distributed computation or to
optimize in terms of cache due to “too much locality”
• Skewed degree distributions make efficient parallelization and load
balance difficult to achieve
• Multiple levels of cache/memory and increasing reliance on wide
parallelism for modern HPC systems compounds the above challenges
Challenges
• Heterogeneity
– Various types of entities and interactions are involved
• Evolution
– Timeliness is emphasized in social media
• Collective Intelligence
– How to utilize wisdom of crowds in forms of tags, wikis, reviews
• Evaluation
– Lack of ground truth, and complete information due to privacy

64
Social Computing Tasks
• Social Computing: a young and vibrant field
• Conferences: KDD, WSDM, WWW, ICML, AAAI/IJCAI,
SocialCom, etc.
• Tasks
– Network Modeling
– Centrality Analysis and Influence Modeling Our Focus
– Community Detection
– Sentiment Analysis, Classification and Recommendation
– Privacy, Spam and Security

65
Network Modeling
• Large Networks demonstrate statistical patterns:
• Small-world effect (e.g., 6 degrees of separation)
• Power-law distribution (a.k.a. scale-free distribution)
• Community structure (high clustering coefficient)
• Model the network dynamics
• Reproducing large-scale networks
• Examples: random graph, preferential attachment process, Watts and Strogatz
model
• Simulation to understand network properties
• Thomas Shelling’s famous simulation: What could cause the segregation of white and black
people
• Network robustness under attack
Centrality Analysis and Influence Modeling

• Centrality Analysis:
• Identify the most important actors or edges
• E.g. PageRank in Google
• Various other criteria
• Influence modeling:
• How is information diffused?
• How does one influence each other?
• Related Problems
• Viral marketing: word-of-mouth effect
• Influence maximization

67
Community Detection
• A community is a set of nodes between which the interactions are (relatively)
frequent
– A.k.a., group, cluster, cohesive subgroups, modules

• Applications: Recommendation based communities, Network Compression, Visualization


of a huge network
• New lines of research in social media
– Community Detection in Heterogeneous Networks
– Community Evolution in Dynamic Networks
– Scalable Community Detection in Large-Scale Networks

68
Classification and Recommendation
• Common in social media applications
• Tag suggestion, Product/Friend/Group Recommendation

Link prediction

Network-Based Classification

69
Privacy, Spam and Security
• Privacy is a big concern in social media
• Facebook, Google buzz often appear in debates about privacy
• NetFlix Prize Sequel cancelled due to privacy concern
• Simple annoymization does not necessarily protect privacy
• Spam blog (splog), spam comments, Fake identity, etc., all requires new techniques
• As private information is involved, a secure and trustable system is critical
• Need to achieve a balance between sharing and privacy

70
Sentiment Analysis in
Social Network Analysis
Example of Opinion

(1) I bought an iPhone a few days ago.


(2) It was such a nice phone.
(3) The touch screen was really cool. +Positive Opinion
(4) The voice quality was clear too.
(5) However, my mother was mad with me as I did not tell
her before I bought it.
(6) She also thought the phone was too expensive, and
wanted me to return it to the shop. … ” -Negative Opinion

72
Source: Bing Liu (2011) , “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,
Sentiment Analysis vs Subjectivity Analysis

Sentiment Subjectivity
Analysis Analysis

Positive
Subjective
Negative

Neutral Objective

73
Levels of Sentiment Analysis

Sentiment Analysis

Word Sentence Document


Feature level
level level level
Sentiment Analysis
Sentiment Analysis Sentiment Analysis Sentiment Analysis

Vishal Kharde and Sheetal Sonawane (2016), "Sentiment Analysis of Twitter Data: A Survey of Techniques,"
International Journal of Computer Applications, Vol 139, No. 11, 2016. pp.5-15
Subjectivity
Classification
Polarity
Determination
Approaches
Sentiment Vagueness Machine Learning based
Classification resolution in
opinionated
Review text Lexicon based
Sentiment
Usefulness
Analysis Measurement
Multi- & Cross-
Hybrid
Lingual SC
approaches
Opinion Spam Cross-domain
Detection SC

Lexicon Ontology based


Creation
Aspect
Extraction Tasks Non-Ontology based

Source: Kumar Ravi and Vadlamani Ravi (2015), "A survey on opinion mining
Application and sentiment analysis: tasks, approaches and applications." Knowledge-Based Systems,
89, pp.14-46.

75
Sentiment Classification Techniques
Support Vector
Decision Tree Machine (SVM)
Classifiers
Supervised Neural Network
Machine Learning Linear (NN)
Classifiers Deep Learning
Learning
(DL)
Approach Rule-based
Classifiers Naïve Bayes
(NB)
Sentiment Unsupervised Probabilistic
Analysis Learning Bayesian
Classifiers Network (BN)
Dictionary- Maximum
Lexicon- based Entropy (ME)
Approach
based
Approach Statistical
Corpus-based
Source: Jesus Serrano-Guerrero, Jose A. Olivas, Francisco P. Romero, and Enrique Herrera-Viedma (2015),
"Sentiment analysis: A review and comparative analysis of web services," Information Sciences, 311, pp. 18-38.
Approach Semantic
Features and Metrics in Social Network

• Types of User Generated content


• Profile-based social networks: focused on the users and on their desire to express
themselves and communicate with their contacts (eg, Facebook, MySpace) ; among
social networks of this type, Facebook is the first resource concerning the social sphere
of people (friends, family, etc.), where individuals share content especially about their
private lives, personal interests, and activities;
• Microblogging social network: focused on the shared message, which has to be short
and clear (eg, Twitter). Twitter is the most famous one, and is often described as a site
of “amateur journalism” [20] where people share content especially about specific and
current events and situations;
• Content-based social network: focused on the content posted by users (eg, YouTube,
Flickr, Instagram).
Features and Metrics in Social Network

• Types of Relationship between users


• Two way, or “friendship” (eg, Facebook): allows users who are friends with each
other to their access friends’ profiles, contact them directly through a private chat (ie,
Messenger), read new messages on their bulletin board, explore their social network,
and know the actions within the social network (ie, membership in groups, places
visited, etc.).
• “Star” (eg, Twitter): this clearly distinguishes between sender and receiver. The
message issuer can be general (ie, shared with all the receivers on the social
network) or individual (ie, directed to a specific receiver).
How can Social Network Analytics improve
Sentiment Analysis in SNA?

• Sentiment Analysis is one of the most used methods adopted to analyze data collected
through online social networks à can investigate the opinions and attitudes expressed
online by means of natural language processing tools
• Problems: it does not allow one to consider data within the online network in which they
have been collected.
• Social Network Analysis, through a quantitative-relational approach, makes it possible to
consider data as “networked” (ie, considering existing connections and links between
users).
How to Integrate SNA in SA?

• The properties of the linkage between individuals on online social networks are critical to
an understanding of the process of social influence through them à represented by the
sociological construct of tie strength that represents the strength of the dyadic
interpersonal relationships in the context of social networks
• Comment between 2 people is useful for predicting edge signs and may be used to fit a
conventional sentiment model. A purely edge feature–based sentiment model cannot
account for the network structure since it reasons about edges as independent of each
other.
Conclusions
• Network:
• Scale
• Structure
• Information Diffusion
• Challenges
• Heterogeneity
• Evolution
• Evaluation
References

• Community Detection and Mining in Social Media. Lei Tang and Huan Liu,
Morgan & Claypool, September, 2010.
• Newman, ME, The Structure and Function of Complex Networks, SIAM
2003
• Watts, DJ; Strogatz, S H. 1998.
Collective dynamics of 'small-world' networks, NATURE 393(668).
• Barabasi, AL. Network Science Network Science
CII-4O3
Social Network Analysis

Thank You

You might also like