Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 51

Central questions

• Where is network?
• Why do we care?
• How do we analyze or model
it?
Social Network Analysis

江彥生
Central questions
• Where is network?
• Why do we care?
• How do we analyze or model
it?
About this course
• A metaphor
Driver Car repair persons Automobile engineer
Backgrounds of participants
• Motives • Knowledge about R

• 了解如何實際操作工具 • 初步掌握到熟悉 (33.1%)


(40.4%)
• 需要提醒 (23.1%)
• 了解社會網絡分析的初步概
念,成為未來再尋求應用的 • 沒有使用經驗 (43.8%)
資源 (23.1%) 。
What is a network?
Components
Nodes (vertices, dots,… 結點… )
Edges (links, connections, ties,….. 連結 )
What is this network?
US high school student friendship network
What is this network?
US high school student romance network
What is this network?
Internet (links between web-pages)
What is this network?
Academic collaboration network
What is this network?
What is this network?
Air route network
What is this network?
Brain network
Where can we find a network?
- It is ubiquitous in our lives:
- Friendship
- Sexual relationship
- Organization alliance
- International relation
- Neuron structure of the brain
- Semantic networks
- Traffic
- Spread of virsus
- …….
Central Themes
- A set of entities
- Relationships of these entities
- Structure
- Research questions
- How can we describe the structure?
- How does the structure explain a social phenomenon?
Topics
- Examples
- Spread of information, rumor, diseases, etc.
- Traffic flow (such as flight routes)
- Formation of friendship, business alliance, etc.
- Interpersonal trust or conflicts
- Dating/marriage market
- Inter-organizational competition
- International alliance
- …..
Outline of the Course
• Part 1—Network Data
• Part 2—Network Graphs, Visualizations, Two-mode networks
• Part 3—Measuring networks
• Part 4—Exponential Random Graph Models
• Part 5—Simulation Modeling
Conceptual Maps
• Part 1—Introduction, R basics, Network Data
• Ego-centric data, respondent driven sampling, etc.
• Why do we care?
• What kind of information can networks tell us?
• How to collect and code network data?
• Part 2—Network Graphs, Visualizations, Two-mode networks
• Layout algorithm, dynamic visualization, one mode projection method
• How can we graphically illustrate a network?
• What kind if information can we learn from two-mode networks?
Conceptual Maps
• Part 3—Measuring networks
• Brokerage, clustering, centrality, cohesiveness, etc.
• How doe we distinguish one node from another in network?
• How do we quantify the structural characteristics of networks?
• Part 4—Exponential Random Graph Models
• The p* model, dyad dependency, network specifications, etc.
• What is the mechanism underlying the formation of a network?
• How can we assure that the statistics is not random?
Conceptual Maps
• Part 5—Simulation Modeling
• Diffusion, threshold models, cultural drift, structural balance, etc.
• How can we use network to model social phenomena?
• How do we model the diffusion of innovation, ideas and norms
in social networks?
• How do we explain the emergence of polarization in
ideology/cultural preferences in networks?
Network Data
• Real world networks
• Sources
• Survey
• Observation
• Online
• Historical archive
• ….
• Formation mechanism is usually unclear
Real Data: whole-view vs. ego-central
• Ego-centric networks
• Networks around one individual
• Example: a person’s LINE friends

• Whole view networks


• A complete structure of the whole set of entities
Whole vs. Partial
• Ego-centric networks
• When do we use it?
• Cannot assess the whole population
• Research does not require a whole network view
• Individual perspective
• E.g., how does social support influence a person’s well-being?

• Whole view networks


• When do we use it?
• Capable of accessing the whole population
• Research requires a systematic view of it
• E.g., how does rumor spread in an organization?
Data Format for Analyses
• Node-Edge list
• Nods: {Albert, Bob, Cindy, David,…….}
• Edges: Cindy—>Bob, David < --- > Helen,…….
• Matrix
• Like spread sheet
• More complicated entries
• Example
• Interlocking directorate
• Human-ecological networks
Example of two-mode data
• Interlocking directorate
• Managers serve in multiple
directorate boards
• Manager vs. manage
• Firms vs. firms
• Human ecological relationship
• Villagers forage in different forests
• Villagers vs. villagers
• Forests vs. forests
Why use ‘matrix’ to represent networks?
• Intuitive and reasonable
• n by n matrix (spreadsheet)
• Matrix can be operated mathematically (linear algebra)
• Example:
• matrix of friendship
• A matrix times itself
• Showing who is friend two step away (if they are not friends)?

• Two-mode network
• A matrix times the transpose of it
• Showing how many affiliations two persons share in common
Data Collection/Mining
• Survey
• general population
• Villages
• organization
• School, company, etc.
• Historical archive
• Erickson and Bearman (2004)
• Gould (1991)
• 葉高華 ( 2016 )
• Digital records
• Data crawling
• Text mining
Creative ways to reveal social networks

• Criminal network? How?


• Interview gangsters,
prisoners, etc.
• Murder networks
• Papachristos (2009)
Creative ways to reveal social networks

• Dolphins’ social networks


(marine biologists’ observation
near New Zealand)
Creative ways to reveal social networks (4)
• Participant observation
• Freeman and Webster (1994)
• Imagine that you are watching kids playing in a playground….

Who are play mates?


Other creative ways?
• Other creative or effective ways of collecting network data?
• MRT pass ( 悠遊卡 )
• Wedding invitation card
• Movie actor list
• Newspaper reports
• Phone calls records
• ….
Hidden Population
Snowball sampling
• Useful to reach “hidden” • Methods to get in touch with these
population populations
• Examples • Snowball sampling
• Prostitutes • Problems: initial sample is critical; biased to
• Homeless cooperators;
• Drug users • Key informant sampling
• Gangsters • Example: social workers
• Movie stars? • Problem: institutional biases
• ….. • Target sampling
• Example: interview housemaids in parks on
Sundays
• Problem: most hidden people are invisible
in public
Respondent driven sampling (RDS)
• Heckathorn (2007)
• An adaptation of snow-ball sampling
• Emphasis on the incentive used for recruiting their peers
• Incentives
• Primary: respondent rewarded for finishing the interview/questionnaire
• Secondary: rewarded for recruiting peers

• Steps
• Select a set of seeds of targets
• For each seed, collect their information
• Recruit the next target from the current target (providing incentives)
• Until sample size is met
Does it work?
• Heckathorn (2007)
• How does incentive work?
• Recruitment would be more effective
• Using coupons to reduce concern of privacy
• Facilitate a number of waves of recruitment
• One of the key weakness of the snowball sampling is that the initial sample
must be representative enough!
• Yet, simulation modeling shows that after a couple of recruitment waves, the equilibrium
will converge and become less an less sensitive to the initial seeds
Simulation Test
Simulation result
Simulation Test
Simulation result
Respondent driven sampling: Concerns
• The RDS method is valid under the following assumptions:
• Each participant reports accurate network information
• Each participant randomly recruits his/her network neighbor (with
replacement)
• Homophily bias is equal across groups (White, Black, Hispanic, etc.)
• Violation of the assumptions would increase the standard errors of
the estimation
• Need to make adjustments to the estimates
• Recent development:
• Heckathorn and Cameron (2017)
• Baraff, McCormick and Raftery (2016)
Demonstration of the “tree bootstrap
method”
• Imagine you recruit gang members
• You want to assess the percentage of drug addiction in
the gang

• Step 1: Complete RDS data collection (see A)


• Step 2: Resample data (bootstrap) along the
“tree” (see B)
• Step 3: After a few rounds of tree bootstrapping,
calculate the statistics of interest

(drug addiction)
Respondent driven sampling: Adjustment
• How do we know which
adjustment method works
better?

• Use given network data sets and


then simulate the RDS methods
to see which method renders
more accurate estimation of the
population statistic
A choice of network analysis and visualization software

• Consideration
• Learning curve
• GUI friendliness vs. programming flexibility
• Aesthetic requirement
• Computing efficiency

• Different paths toward the same destination

• Reference: Combe et al. (2010)


Options

• UCINET (https://sites.google.com/site/ucinetsoftware/home)

• Pajak (http://vlado.fmf.uni-lj.si/pub/networks/pajek/)

• Gephi (https://gephi.org/)

• NetworkX (https://networkx.github.io/)

• igraph (http://igraph.org/redirect.html)
* Note there are other R packages available for analyzing and modeling network data, such as SNA, STATNET, etc
Reference
• I use the R code from the following reference
Network data available to play with
• Network data repository *

You might also like