Exponential Random Graph Models

Patrick O. Perry, NYU Stern School of Business

Most of the code here is taken from examples from the ergm R package by the following authors: Mark S. Handcock, David R. Hunter, Carter T. Butts, Steven M. Goodreau, Pavel N. Krivitsky, Martina Morris, Li Wang, Kirk Li, and Skye Bender-deMoll.

Preliminaries

Computing environment

We will use the following R packages.

library("network")
library("ergm")

To ensure consistent runs, we set the seed before performing any analysis.

set.seed(0)

Data

We will analyze the Forentine family marriage network.

data(flo)

The previous command loads a 16-by-16 symmetric binary (0/1) matrix. Each row (and column) of this matrix corresponds to one of the following prominent families from the Italian Renaissance:

colnames(flo)
 [1] "Acciaiuoli"   "Albizzi"      "Barbadori"    "Bischeri"     "Castellani"   "Ginori"      
 [7] "Guadagni"     "Lamberteschi" "Medici"       "Pazzi"        "Peruzzi"      "Pucci"       
[13] "Ridolfi"      "Salviati"     "Strozzi"      "Tornabuoni"  

Here are the matrix entries (with the row and column labels shortened to save space):

x <- as.matrix(flo)
colnames(x) <- substr(colnames(flo), 1, 2)
rownames(x) <- colnames(x)
x
   Ac Al Ba Bi Ca Gi Gu La Me Pa Pe Pu Ri Sa St To
Ac  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0
Al  0  0  0  0  0  1  1  0  1  0  0  0  0  0  0  0
Ba  0  0  0  0  1  0  0  0  1  0  0  0  0  0  0  0
Bi  0  0  0  0  0  0  1  0  0  0  1  0  0  0  1  0
Ca  0  0  1  0  0  0  0  0  0  0  1  0  0  0  1  0
Gi  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0
Gu  0  1  0  1  0  0  0  1  0  0  0  0  0  0  0  1
La  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0
Me  1  1  1  0  0  0  0  0  0  0  0  0  1  1  0  1
Pa  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0
Pe  0  0  0  1  1  0  0  0  0  0  0  0  0  0  1  0
Pu  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
Ri  0  0  0  0  0  0  0  0  1  0  0  0  0  0  1  1
Sa  0  0  0  0  0  0  0  0  1  1  0  0  0  0  0  0
St  0  0  0  1  1  0  0  0  0  0  1  0  1  0  0  0
To  0  0  0  0  0  0  1  0  1  0  0  0  1  0  0  0

Network objects

We create a network object from the adjacency matrix.

(flomarriage <- network(flo, directed=FALSE))
 Network attributes:
  vertices = 16 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 20 
    missing edges= 0 
    non-missing edges= 20 

 Vertex attribute names: 
    vertex.names 

No edge attributes

In our case, the network is undirected, so we specify directed = FALSE.

Network objects support covariates on the vertices and edges with the %v% and %e% operators, respectively. For example, we can set the wealth (in thousands of lira) of each family:

flomarriage %v% "wealth" <- c(10,36,27,146,55,44,20,8,42,103,48,49,10,48,32,3)
flomarriage
 Network attributes:
  vertices = 16 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 20 
    missing edges= 0 
    non-missing edges= 20 

 Vertex attribute names: 
    vertex.names wealth 

No edge attributes

(Note that wealth is now listed as a vertex attribute.)

Visualization

Here is a plot of the social network.

plot(flomarriage)

plot of chunk unnamed-chunk-8

Don't read too much into the specific (x, y) coordinates for the edges. As Mike Mahoney likes to point out, pictures of networks often tell you more about the visualization algorithm than they do about the data. When you look at a network plot, you should only pay attention to the connectivity information.

Here is another plot, with the vertex size chosen proportionally to the wealth of the family.

plot(flomarriage, vertex.cex=flomarriage %v% "wealth" / 20,
     main="Marriage Ties")

plot of chunk unnamed-chunk-9

Modeling

Dyadic independence

First, we fit a model assuming that edges form independently, with a probability that depends on the absolute difference (absdiff) between the family wealths.

gest <- ergm(flomarriage ~ edges + absdiff("wealth"))
Evaluating log-likelihood at the estimate. 
summary(gest)

==========================
Summary of model fit
==========================

Formula:   flomarriage ~ edges + absdiff("wealth")

Iterations:  4 out of 20 

Monte Carlo MLE Results:
                Estimate Std. Error MCMC % p-value    
edges          -1.457666   0.354532      0  <1e-04 ***
absdiff.wealth -0.004176   0.007387      0   0.573    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

     Null Deviance: 166.4  on 120  degrees of freedom
 Residual Deviance: 107.8  on 118  degrees of freedom

AIC: 111.8    BIC: 117.4    (Smaller is better.) 

In this model, there is no apparent association between wealth and tie formation.

Higher-order Markov effects

We can expand the model by adding terms for the propensity for families to form 2-stars and triangles.

gest <- ergm(flomarriage ~ edges + absdiff("wealth") + kstar(2) + triangle)
Starting maximum likelihood estimation via MCMLE:
Iteration 1 of at most 20: 
The log-likelihood improved by 0.009017 
Step length converged once. Increasing MCMC sample size.
Iteration 2 of at most 20: 
The log-likelihood improved by 0.0005438 
Step length converged twice. Stopping.
Evaluating log-likelihood at the estimate. Using 20 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 .

This model was fit using MCMC.  To examine model diagnostics and check for degeneracy, use the mcmc.diagnostics() function.
summary(gest)

==========================
Summary of model fit
==========================

Formula:   flomarriage ~ edges + absdiff("wealth") + kstar(2) + triangle

Iterations:  2 out of 20 

Monte Carlo MLE Results:
                Estimate Std. Error MCMC % p-value
edges          -1.367415   0.964019      0   0.159
absdiff.wealth -0.004302   0.007392      0   0.562
kstar2         -0.035045   0.222881      0   0.875
triangle        0.201143   0.774351      0   0.796

     Null Deviance: 166.4  on 120  degrees of freedom
 Residual Deviance: 107.7  on 116  degrees of freedom

AIC: 115.7    BIC: 126.9    (Smaller is better.) 

Even after we include these higher-order effects, there is still no apparent association between wealth and tie formation.

Session information

sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  base     

other attached packages:
[1] ergm_3.6.0           network_1.13.0       statnet.common_3.3.0 RColorBrewer_1.1-2  
[5] knitr_1.12.3        

loaded via a namespace (and not attached):
 [1] codetools_0.2-14  lattice_0.20-33   digest_0.6.8      MASS_7.3-45       grid_3.2.3       
 [6] formatR_1.1       magrittr_1.5      coda_0.18-1       evaluate_0.8      stringi_1.0-1    
[11] robustbase_0.92-3 Matrix_1.2-3      trust_0.1-7       tools_3.2.3       stringr_1.0.0    
[16] DEoptimR_1.0-2    parallel_3.2.3    lpSolve_5.6.13    methods_3.2.3