Patrick O. Perry, NYU Stern School of Business
Most of the code here is taken from examples from the ergm R package by the following authors: Mark S. Handcock, David R. Hunter, Carter T. Butts, Steven M. Goodreau, Pavel N. Krivitsky, Martina Morris, Li Wang, Kirk Li, and Skye Bender-deMoll.
We will use the following R packages.
library("network")
library("ergm")
To ensure consistent runs, we set the seed before performing any analysis.
set.seed(0)
We will analyze the Forentine family marriage network.
data(flo)
The previous command loads a 16-by-16 symmetric binary (0/1) matrix. Each row (and column) of this matrix corresponds to one of the following prominent families from the Italian Renaissance:
colnames(flo)
[1] "Acciaiuoli" "Albizzi" "Barbadori" "Bischeri" "Castellani" "Ginori"
[7] "Guadagni" "Lamberteschi" "Medici" "Pazzi" "Peruzzi" "Pucci"
[13] "Ridolfi" "Salviati" "Strozzi" "Tornabuoni"
Here are the matrix entries (with the row and column labels shortened to save space):
x <- as.matrix(flo)
colnames(x) <- substr(colnames(flo), 1, 2)
rownames(x) <- colnames(x)
x
Ac Al Ba Bi Ca Gi Gu La Me Pa Pe Pu Ri Sa St To
Ac 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
Al 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0
Ba 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0
Bi 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0
Ca 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0
Gi 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Gu 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 1
La 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
Me 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 1
Pa 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
Pe 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0
Pu 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Ri 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1
Sa 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0
St 0 0 0 1 1 0 0 0 0 0 1 0 1 0 0 0
To 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0
We create a network object from the adjacency matrix.
(flomarriage <- network(flo, directed=FALSE))
Network attributes:
vertices = 16
directed = FALSE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
total edges= 20
missing edges= 0
non-missing edges= 20
Vertex attribute names:
vertex.names
No edge attributes
In our case, the network is undirected, so we specify directed = FALSE
.
Network objects support covariates on the vertices and edges with the %v%
and %e%
operators, respectively. For example, we can set the wealth (in
thousands of lira) of each family:
flomarriage %v% "wealth" <- c(10,36,27,146,55,44,20,8,42,103,48,49,10,48,32,3)
flomarriage
Network attributes:
vertices = 16
directed = FALSE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
total edges= 20
missing edges= 0
non-missing edges= 20
Vertex attribute names:
vertex.names wealth
No edge attributes
(Note that wealth
is now listed as a vertex attribute.)
Here is a plot of the social network.
plot(flomarriage)
Don't read too much into the specific (x, y) coordinates for the edges. As Mike Mahoney likes to point out, pictures of networks often tell you more about the visualization algorithm than they do about the data. When you look at a network plot, you should only pay attention to the connectivity information.
Here is another plot, with the vertex size chosen proportionally to the wealth of the family.
plot(flomarriage, vertex.cex=flomarriage %v% "wealth" / 20,
main="Marriage Ties")
First, we fit a model assuming that edges form independently, with a
probability that depends on the absolute difference (absdiff
) between the
family wealths.
gest <- ergm(flomarriage ~ edges + absdiff("wealth"))
Evaluating log-likelihood at the estimate.
summary(gest)
==========================
Summary of model fit
==========================
Formula: flomarriage ~ edges + absdiff("wealth")
Iterations: 4 out of 20
Monte Carlo MLE Results:
Estimate Std. Error MCMC % p-value
edges -1.457666 0.354532 0 <1e-04 ***
absdiff.wealth -0.004176 0.007387 0 0.573
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Null Deviance: 166.4 on 120 degrees of freedom
Residual Deviance: 107.8 on 118 degrees of freedom
AIC: 111.8 BIC: 117.4 (Smaller is better.)
In this model, there is no apparent association between wealth and tie formation.
We can expand the model by adding terms for the propensity for families to form 2-stars and triangles.
gest <- ergm(flomarriage ~ edges + absdiff("wealth") + kstar(2) + triangle)
Starting maximum likelihood estimation via MCMLE:
Iteration 1 of at most 20:
The log-likelihood improved by 0.009017
Step length converged once. Increasing MCMC sample size.
Iteration 2 of at most 20:
The log-likelihood improved by 0.0005438
Step length converged twice. Stopping.
Evaluating log-likelihood at the estimate. Using 20 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 .
This model was fit using MCMC. To examine model diagnostics and check for degeneracy, use the mcmc.diagnostics() function.
summary(gest)
==========================
Summary of model fit
==========================
Formula: flomarriage ~ edges + absdiff("wealth") + kstar(2) + triangle
Iterations: 2 out of 20
Monte Carlo MLE Results:
Estimate Std. Error MCMC % p-value
edges -1.367415 0.964019 0 0.159
absdiff.wealth -0.004302 0.007392 0 0.562
kstar2 -0.035045 0.222881 0 0.875
triangle 0.201143 0.774351 0 0.796
Null Deviance: 166.4 on 120 degrees of freedom
Residual Deviance: 107.7 on 116 degrees of freedom
AIC: 115.7 BIC: 126.9 (Smaller is better.)
Even after we include these higher-order effects, there is still no apparent association between wealth and tie formation.
sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets base
other attached packages:
[1] ergm_3.6.0 network_1.13.0 statnet.common_3.3.0 RColorBrewer_1.1-2
[5] knitr_1.12.3
loaded via a namespace (and not attached):
[1] codetools_0.2-14 lattice_0.20-33 digest_0.6.8 MASS_7.3-45 grid_3.2.3
[6] formatR_1.1 magrittr_1.5 coda_0.18-1 evaluate_0.8 stringi_1.0-1
[11] robustbase_0.92-3 Matrix_1.2-3 trust_0.1-7 tools_3.2.3 stringr_1.0.0
[16] DEoptimR_1.0-2 parallel_3.2.3 lpSolve_5.6.13 methods_3.2.3