Patrick O. Perry, NYU Stern School of Business
Most of the code here is taken from examples from the latentnet R package by the following authors: Pavel N. Krivitsky, Mark S. Handcock, Susan M. Shortreed, Jeremy Tantrum, Peter D. Hoff, Li Wang, and Kirk Li.
We will use the following R packages.
library("latentnet")
To ensure consistent runs, we set the seed before performing any analysis.
set.seed(0)
We will analyze Sampson's monk network.
data(sampson)
The previous command loads a symmetric network called samplike
, with edges
indicating monk affinities for each other:
samplike
Network attributes:
vertices = 18
directed = TRUE
hyper = FALSE
loops = FALSE
multiple = FALSE
total edges= 88
missing edges= 0
non-missing edges= 88
Vertex attribute names:
cloisterville group vertex.names
Edge attribute names:
nominations
In a latent position model, each node has an associated latent position. Notes
with nearby latent positions are likely to form ties. To fit such a model, we
need to specify the distance metric, and the dimension. In the example below,
we use euclidean
distance, with d=2
dimensions.
(samp_fit <- ergmm(samplike ~ euclidean(d=2)))
Fitted ERGMM model:
Exponential Random Graph Mixed Model definition
Formula: samplike ~ euclidean(d = 2)
Family: Bernoulli
Terms:
- fixed effects:
- (Intercept)
- latent space of 2 dimensions
For more details of the fit, use the summary
command:
summary(samp_fit)
==========================
Summary of model fit
==========================
Formula: samplike ~ euclidean(d = 2)
Attribute: edges
Model: Bernoulli
MCMC sample of size 4000, draws are 10 iterations apart, after burnin of 10000 iterations.
Covariate coefficients posterior means:
Estimate 2.5% 97.5% 2*min(Pr(>0),Pr(<0))
(Intercept) 2.2078 1.3769 3.1185 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Overall BIC: 381.9872
Likelihood BIC: 252.1252
Latent space/clustering BIC: 129.862
Covariate coefficients MKL:
Estimate
(Intercept) 1.163472
We can check the MCMC convergence diagnositics:
mcmc.diagnostics(samp_fit)
Chain 1
Lag 0
lpY beta.1 Z.1.1 Z.1.2
lpY 1.0000000 0.4742584 0.1678678 0.1115915
beta.1 0.4742584 1.0000000 0.2070105 0.2009421
Z.1.1 0.1678678 0.2070105 1.0000000 -0.2905797
Z.1.2 0.1115915 0.2009421 -0.2905797 1.0000000
Lag 10
lpY beta.1 Z.1.1 Z.1.2
lpY 0.26575884 0.20645925 0.07526682 0.02212651
beta.1 0.21367821 0.22889886 0.08717512 0.03684092
Z.1.1 0.05245539 0.06800761 0.45962472 -0.18533520
Z.1.2 0.02585762 0.01131847 -0.18378066 0.36227548
[[1]]
Quantile (q) = 0.025
Accuracy (r) = +/- 0.0125
Probability (s) = 0.95
Burn-in Total Lower bound Dependence
(M) (N) (Nmin) factor (I)
lpY 30 7170 600 12.0
beta.1 40 7490 600 12.5
Z.1.1 40 7800 600 13.0
Z.1.2 40 7800 600 13.0
We can also use the estimated latent positions to plot the network.
plot(samp_fit)
We can extend the Using Sampson's Monk data, lets fit a latent clustering random effects model
(samp_fit2 <- ergmm(samplike ~ euclidean(d=2, G=3)))
Fitted ERGMM model:
Exponential Random Graph Mixed Model definition
Formula: samplike ~ euclidean(d = 2, G = 3)
Family: Bernoulli
Terms:
- fixed effects:
- (Intercept)
- latent space of 2 dimensions in 3 clusters
summary(samp_fit2)
==========================
Summary of model fit
==========================
Formula: samplike ~ euclidean(d = 2, G = 3)
Attribute: edges
Model: Bernoulli
MCMC sample of size 4000, draws are 10 iterations apart, after burnin of 10000 iterations.
Covariate coefficients posterior means:
Estimate 2.5% 97.5% 2*min(Pr(>0),Pr(<0))
(Intercept) 2.0113 1.3297 2.7751 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Overall BIC: 337.7598
Likelihood BIC: 253.5318
Latent space/clustering BIC: 84.22803
Covariate coefficients MKL:
Estimate
(Intercept) 1.18777
Convergence diagnostics:
mcmc.diagnostics(samp_fit2)
Chain 1
Lag 0
lpY beta.1 Z.1.1 Z.1.2
lpY 1.00000000 0.33940312 0.011056095 0.097391329
beta.1 0.33940312 1.00000000 -0.032049123 0.290736714
Z.1.1 0.01105610 -0.03204912 1.000000000 0.008744742
Z.1.2 0.09739133 0.29073671 0.008744742 1.000000000
Lag 10
lpY beta.1 Z.1.1 Z.1.2
lpY 0.27293386 0.32232116 -0.012964365 0.085824740
beta.1 0.32304469 0.87549029 -0.016190967 0.261043183
Z.1.1 -0.02550588 -0.02546508 0.736025398 -0.005903795
Z.1.2 0.09603661 0.26353799 0.004879753 0.768123924
[[1]]
Quantile (q) = 0.025
Accuracy (r) = +/- 0.0125
Probability (s) = 0.95
Burn-in Total Lower bound Dependence
(M) (N) (Nmin) factor (I)
lpY 30 6890 600 11.5
beta.1 130 21970 600 36.6
Z.1.1 100 18140 600 30.2
Z.1.2 270 52350 600 87.2
Fitted positions and clusters:
plot(samp_fit2, pie=TRUE)
We can add other terms to the model, for example a random effect capturing
heterogeneous popularities of tie receivers. See the documentation for
terms.ergmm
for other possible model terms.
(samp_fit3 <- ergmm(samplike ~ euclidean(d=2, G=3)+rreceiver))
Fitted ERGMM model:
Exponential Random Graph Mixed Model definition
Formula: samplike ~ euclidean(d = 2, G = 3) + rreceiver
Family: Bernoulli
Terms:
- fixed effects:
- (Intercept)
- latent space of 2 dimensions in 3 clusters
- random receiver effects
summary(samp_fit3)
==========================
Summary of model fit
==========================
Formula: samplike ~ euclidean(d = 2, G = 3) + rreceiver
Attribute: edges
Model: Bernoulli
MCMC sample of size 4000, draws are 10 iterations apart, after burnin of 10000 iterations.
Covariate coefficients posterior means:
Estimate 2.5% 97.5% 2*min(Pr(>0),Pr(<0))
(Intercept) 2.0495 1.2101 2.9662 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Receiver effect variance: 1.001218.
Overall BIC: 335.2027
Likelihood BIC: 224.0979
Latent space/clustering BIC: 73.73878
Receiver effect BIC: 37.36593
Covariate coefficients MKL:
Estimate
(Intercept) 1.148547
Convergence diagnostics:
mcmc.diagnostics(samp_fit3)
Chain 1
Lag 0
lpY beta.1 Z.1.1 Z.1.2 receiver.1
lpY 1.00000000 0.25164623 -0.0103108221 -0.05945303 -0.2379307505
beta.1 0.25164623 1.00000000 0.0154538069 -0.18103882 -0.2498164593
Z.1.1 -0.01031082 0.01545381 1.0000000000 0.01348806 0.0009458873
Z.1.2 -0.05945303 -0.18103882 0.0134880607 1.00000000 -0.0165133365
receiver.1 -0.23793075 -0.24981646 0.0009458873 -0.01651334 1.0000000000
Lag 10
lpY beta.1 Z.1.1 Z.1.2 receiver.1
lpY 0.30964946 0.16140865 -0.022500637 -0.0289526903 -0.12152890
beta.1 0.16866577 0.50908539 0.021493637 -0.0770662049 -0.21391357
Z.1.1 -0.01615094 0.03177671 0.833106110 -0.0005132335 -0.02047933
Z.1.2 -0.05070788 -0.08016389 -0.004604112 0.7749657837 0.01769491
receiver.1 -0.11176699 -0.15811914 -0.015626937 0.0009864863 0.32529743
[[1]]
Quantile (q) = 0.025
Accuracy (r) = +/- 0.0125
Probability (s) = 0.95
Burn-in Total Lower bound Dependence
(M) (N) (Nmin) factor (I)
lpY 30 7320 600 12.2
beta.1 50 8670 600 14.4
Z.1.1 240 55640 600 92.7
Z.1.2 120 27960 600 46.6
receiver.1 40 8480 600 14.1
Fitted positions and clusters:
plot(samp_fit3, pie=TRUE)
In the examples above, we fixed the number of clusters (G=3
). We can choose the number of clusters automatically with BICs:
# Fit a set of candidate models
fits<-list(
ergmm(samplike~euclidean(d=2,G=1)),
ergmm(samplike~euclidean(d=2,G=2)),
ergmm(samplike~euclidean(d=2,G=3)),
ergmm(samplike~euclidean(d=2,G=4))
)
# Comptue the BICs
(bics<-reshape(
as.data.frame(t(sapply(fits,
function(x)c(G=x$model$G,unlist(bic.ergmm(x))[c("Y","Z","overall")])))),
list(c("Y","Z","overall")),idvar="G",v.names="BIC",timevar="Component",
times=c("likelihood","clustering","overall"),direction="long"
))
G Component BIC
1.likelihood 1 likelihood 252.16378
2.likelihood 2 likelihood 253.14269
3.likelihood 3 likelihood 253.45024
4.likelihood 4 likelihood 255.00933
1.clustering 1 clustering 127.87458
2.clustering 2 clustering 120.69210
3.clustering 3 clustering 84.55446
4.clustering 4 clustering 85.53359
1.overall 1 overall 380.03835
2.overall 2 overall 373.83480
3.overall 3 overall 338.00470
4.overall 4 overall 340.54293
# Plot BIC versus number of clusters
with(bics,interaction.plot(G,Component,BIC,type="b",xlab="Clusters", ylab="BIC"))
# Summarize and plot whichever fit has the lowest overall BIC:
bestG<-with(bics[bics$Component=="overall",],G[which.min(BIC)])
summary(fits[[bestG]])
==========================
Summary of model fit
==========================
Formula: samplike ~ euclidean(d = 2, G = 3)
Attribute: edges
Model: Bernoulli
MCMC sample of size 4000, draws are 10 iterations apart, after burnin of 10000 iterations.
Covariate coefficients posterior means:
Estimate 2.5% 97.5% 2*min(Pr(>0),Pr(<0))
(Intercept) 2.0153 1.3268 2.7959 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Overall BIC: 338.0047
Likelihood BIC: 253.4502
Latent space/clustering BIC: 84.55446
Covariate coefficients MKL:
Estimate
(Intercept) 1.184587
sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets base
other attached packages:
[1] latentnet_2.7.1 ergm_3.6.0 network_1.13.0 statnet.common_3.3.0
[5] knitr_1.12.3
loaded via a namespace (and not attached):
[1] codetools_0.2-14 mvtnorm_1.0-4 lattice_0.20-33 digest_0.6.8 MASS_7.3-45
[6] grid_3.2.3 formatR_1.1 magrittr_1.5 coda_0.18-1 evaluate_0.8
[11] stringi_1.0-1 robustbase_0.92-3 Matrix_1.2-3 sna_2.3-2 trust_0.1-7
[16] tools_3.2.3 stringr_1.0.0 DEoptimR_1.0-2 abind_1.4-3 parallel_3.2.3
[21] lpSolve_5.6.13 methods_3.2.3