Network Science Reading List
Network science is a huge field spanning many disciplines; for newcomers, it is to know where to start. What follows is an incomplete list of network science papers I found to be interesting, organized by topic.
Exponential Random Graph Models
ERGMs are the most widely-used network models in the social sciences. They model relational data through statistics like the numbers of triangles and k-star subgraphs. Unfortunately, they are difficult to fit and interpret.
-
Holland, P. W., and Leinhardt, S. (1981), “An Exponential Family of Probability Distributions for Directed Graphs,” J. Am. Stat. Assoc., 76, 33-50
-
Anderson, C. J., Wasserman S., and Crouch, B. (1999), “A p* Primer: Logit Models for Social Networks,” Soc. Networks, 21, 37-66
-
Snijders, T. A. B. (2002), “Markov Chain Monte Carlo Estimation of Exponential Random Graph Models,” J. Soc. Struct., 3, 1-40
-
Handcock, M. S. (2003), “Assessing Degeneracy in Statistical Models of Social Networks,” Working paper no. 39, Center for Statistics and the Social Sciences, University of Washington-Seattle
Latent Space Models
Latent space models are an alternative to ERGMs which get around dyadic dependence by positing existence of latent covariates. Since their introduction in 2002, they have been extended to include clustering and degree heterogeneity. Beware that these models impose a triangle inequality on social space, which may not be appropriate.
-
Hoff, P. D., Raftery, A. E., and Handcock, M. S. (2002), “Latent Space Approaches to Social Network Analysis,” J. Am. Stat. Assoc., 97, 1090–1098
-
Handcock, M. S., Raftery, A. E., and Tantrum, J. M. (2007), “Model-Based Clustering for Social Networks,” J. R. Statist. Soc. A, 170, 301-354
-
Krivitsky, P. N., Handcock, M. S., Raftery, A. E., and Hoff, P. D. (2009), “Representing Degree Distributions, Clustering, and Homophily in Social Networks with Latent Cluster Random Effects Models,” Soc. Networks, 31, 204-213
Block Models
Block models are another class of network models involving latent variables. While work in the 80s assumed the block structure to be known, the current approach is to assume each node belongs to an unknown class, and the node’s behavior is determined by its class membership. Bickell and Chen have shown it is possible to recover the unknown class labels if the network is big enough.
-
Holland, P. W., Laskey, K. B., and Leinhardt, S. (1983), “Stochastic Blockmodels: First Steps,” Soc. Networks, 5, 109–137
-
Airoldi, E. M., Blei, D. M., Feinberg, S. E., and Xing, E. P. (2008), “Mixed Membership Stochastic Blockmodels,” J. Mach. Learn. Res., 9, 1981-2014
-
Bickell, P. and Chen, A. (2009), “A Nonparametric View of Network Models and Newman-Girvan and Other Modularities,” P. Natl. Acad. Sci., 106, 21068–21073
Agent-Based Models
Agent-based models are similar in spirit to latent space models (network dynamics arise from pairwise behavior) while still keeping some of the attractive features of ERGMs (explicit transitivity or hub/spoke behavior).
-
Jackson, M. O. and Wolinsky, A. (1996), “A Strategic Model of Social and Economic Networks,” J. Econ. Theory., 71, 44-74
-
Snijders, T. A. B., Van de Bunt, G. V., and Steglich, C. E. G. (2010), “Introduction to Stochastic Actor-Based Models for Network Dynamics,” Soc. Networks, 32, 44–60
Community Detection
Community detection in networks is like clustering in traditional data analysis. For some reason, this has received a lot of attention, especially in the physics community. This seems like a fad, but it’s worth knowing about.
-
Newman, M. E. J. (2006), “Modularity and Community Structure in Networks,” P. Natl. Acad. Sci., 103, 8577-8582
-
Leskovec, J., Lang, K. J., Dasgupta A., and Mahoney, M. W. (2009), “Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters,” Internet Mathematics, 6, 29-123
Sampling
Sampling and missing data issues are extremely important, but they largely get ignored. Mostly, this is because they give rise to really hard problems. Often theoretical results are negative–in particular, many have attacked respondent-driven sampling–but without constructive alternatives, it will be hard to advance the field.
-
Heckathorn, D. D. (1997), “Respondent-Driven Sampling: A New Approach to the Study of Hidden Populations,” Social Problems, 44, 174-199
-
Achlioptas, D., Clauset, A., Kempe, D., and Moore, C. (2009), “On the Bias of Traceroute Sampling: Or, Power-Law Degree Distributions in Regular Graphs,” J. ACM, 56, 1-28
-
Handcock, M. S. and Gile, K. J. (2010), “Modeling Social Networks from Sampled Data,” Ann. Appl. Stat., 4, 5-25
Applications
The dirty secret of network science is that the hype is disproportionate to the scientific impact. Below are two of the more important application-driven results. The Christakis and Fowler (2007) paper in particular generated significant attention, both positive and negative.
-
Morris, M. (1997), “Concurrent Partnerships and the Spread of HIV,” AIDS, 11, 641-648
-
Christakis, N. A. and Fowler, J. H. (2007), “The Spread of Obesity in a Large Social Network over 32 Years,” New Engl. J. Med., 357, 370-379