Professional Documents
Culture Documents
Part 1
Part 1
• Where is network?
• Why do we care?
• How do we analyze or model
it?
Social Network Analysis
江彥生
Central questions
• Where is network?
• Why do we care?
• How do we analyze or model
it?
About this course
• A metaphor
Driver Car repair persons Automobile engineer
Backgrounds of participants
• Motives • Knowledge about R
• Two-mode network
• A matrix times the transpose of it
• Showing how many affiliations two persons share in common
Data Collection/Mining
• Survey
• general population
• Villages
• organization
• School, company, etc.
• Historical archive
• Erickson and Bearman (2004)
• Gould (1991)
• 葉高華 ( 2016 )
• Digital records
• Data crawling
• Text mining
Creative ways to reveal social networks
• Steps
• Select a set of seeds of targets
• For each seed, collect their information
• Recruit the next target from the current target (providing incentives)
• Until sample size is met
Does it work?
• Heckathorn (2007)
• How does incentive work?
• Recruitment would be more effective
• Using coupons to reduce concern of privacy
• Facilitate a number of waves of recruitment
• One of the key weakness of the snowball sampling is that the initial sample
must be representative enough!
• Yet, simulation modeling shows that after a couple of recruitment waves, the equilibrium
will converge and become less an less sensitive to the initial seeds
Simulation Test
Simulation result
Simulation Test
Simulation result
Respondent driven sampling: Concerns
• The RDS method is valid under the following assumptions:
• Each participant reports accurate network information
• Each participant randomly recruits his/her network neighbor (with
replacement)
• Homophily bias is equal across groups (White, Black, Hispanic, etc.)
• Violation of the assumptions would increase the standard errors of
the estimation
• Need to make adjustments to the estimates
• Recent development:
• Heckathorn and Cameron (2017)
• Baraff, McCormick and Raftery (2016)
Demonstration of the “tree bootstrap
method”
• Imagine you recruit gang members
• You want to assess the percentage of drug addiction in
the gang
(drug addiction)
Respondent driven sampling: Adjustment
• How do we know which
adjustment method works
better?
• Consideration
• Learning curve
• GUI friendliness vs. programming flexibility
• Aesthetic requirement
• Computing efficiency
• UCINET (https://sites.google.com/site/ucinetsoftware/home)
• Pajak (http://vlado.fmf.uni-lj.si/pub/networks/pajek/)
• Gephi (https://gephi.org/)
• NetworkX (https://networkx.github.io/)
• igraph (http://igraph.org/redirect.html)
* Note there are other R packages available for analyzing and modeling network data, such as SNA, STATNET, etc
Reference
• I use the R code from the following reference
Network data available to play with
• Network data repository *