Latent Space Network Models

Patrick O. Perry, NYU Stern School of Business

Most of the code here is taken from examples from the latentnet R package by the following authors: Pavel N. Krivitsky, Mark S. Handcock, Susan M. Shortreed, Jeremy Tantrum, Peter D. Hoff, Li Wang, and Kirk Li.

Preliminaries

Computing environment

We will use the following R packages.

library("latentnet")

To ensure consistent runs, we set the seed before performing any analysis.

set.seed(0)

Data

We will analyze Sampson's monk network.

data(sampson)

The previous command loads a symmetric network called samplike, with edges indicating monk affinities for each other:

samplike
 Network attributes:
  vertices = 18 
  directed = TRUE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  total edges= 88 
    missing edges= 0 
    non-missing edges= 88 

 Vertex attribute names: 
    cloisterville group vertex.names 

 Edge attribute names: 
    nominations 

Latent Position Models

In a latent position model, each node has an associated latent position. Notes with nearby latent positions are likely to form ties. To fit such a model, we need to specify the distance metric, and the dimension. In the example below, we use euclidean distance, with d=2 dimensions.

(samp_fit <- ergmm(samplike ~ euclidean(d=2)))
Fitted ERGMM model:
Exponential Random Graph Mixed Model definition
Formula: samplike ~ euclidean(d = 2)
Family: Bernoulli 
Terms:
- fixed effects:
 - (Intercept) 
- latent space of 2 dimensions

For more details of the fit, use the summary command:

summary(samp_fit)

==========================
Summary of model fit
==========================

Formula:   samplike ~ euclidean(d = 2)
Attribute: edges
Model:     Bernoulli 
MCMC sample of size 4000, draws are 10 iterations apart, after burnin of 10000 iterations.
Covariate coefficients posterior means:
            Estimate   2.5%  97.5% 2*min(Pr(>0),Pr(<0))    
(Intercept)   2.2078 1.3769 3.1185            < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Overall BIC:        381.9872 
Likelihood BIC:     252.1252 
Latent space/clustering BIC:     129.862 

Covariate coefficients MKL:
            Estimate
(Intercept) 1.163472

We can check the MCMC convergence diagnositics:

mcmc.diagnostics(samp_fit)
Chain 1 
Lag 0 
             lpY    beta.1      Z.1.1      Z.1.2
lpY    1.0000000 0.4742584  0.1678678  0.1115915
beta.1 0.4742584 1.0000000  0.2070105  0.2009421
Z.1.1  0.1678678 0.2070105  1.0000000 -0.2905797
Z.1.2  0.1115915 0.2009421 -0.2905797  1.0000000

Lag 10 
              lpY     beta.1       Z.1.1       Z.1.2
lpY    0.26575884 0.20645925  0.07526682  0.02212651
beta.1 0.21367821 0.22889886  0.08717512  0.03684092
Z.1.1  0.05245539 0.06800761  0.45962472 -0.18533520
Z.1.2  0.02585762 0.01131847 -0.18378066  0.36227548

plot of chunk unnamed-chunk-7plot of chunk unnamed-chunk-7

[[1]]

Quantile (q) = 0.025
Accuracy (r) = +/- 0.0125
Probability (s) = 0.95 

        Burn-in  Total Lower bound  Dependence
        (M)      (N)   (Nmin)       factor (I)
 lpY    30       7170  600          12.0      
 beta.1 40       7490  600          12.5      
 Z.1.1  40       7800  600          13.0      
 Z.1.2  40       7800  600          13.0      

We can also use the estimated latent positions to plot the network.

plot(samp_fit)

plot of chunk unnamed-chunk-8

Model-Based Clustering Models

We can extend the Using Sampson's Monk data, lets fit a latent clustering random effects model

(samp_fit2 <- ergmm(samplike ~ euclidean(d=2, G=3)))
Fitted ERGMM model:
Exponential Random Graph Mixed Model definition
Formula: samplike ~ euclidean(d = 2, G = 3)
Family: Bernoulli 
Terms:
- fixed effects:
 - (Intercept) 
- latent space of 2 dimensions in 3 clusters
summary(samp_fit2)

==========================
Summary of model fit
==========================

Formula:   samplike ~ euclidean(d = 2, G = 3)
Attribute: edges
Model:     Bernoulli 
MCMC sample of size 4000, draws are 10 iterations apart, after burnin of 10000 iterations.
Covariate coefficients posterior means:
            Estimate   2.5%  97.5% 2*min(Pr(>0),Pr(<0))    
(Intercept)   2.0113 1.3297 2.7751            < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Overall BIC:        337.7598 
Likelihood BIC:     253.5318 
Latent space/clustering BIC:     84.22803 

Covariate coefficients MKL:
            Estimate
(Intercept)  1.18777

Convergence diagnostics:

mcmc.diagnostics(samp_fit2)
Chain 1 
Lag 0 
              lpY      beta.1        Z.1.1       Z.1.2
lpY    1.00000000  0.33940312  0.011056095 0.097391329
beta.1 0.33940312  1.00000000 -0.032049123 0.290736714
Z.1.1  0.01105610 -0.03204912  1.000000000 0.008744742
Z.1.2  0.09739133  0.29073671  0.008744742 1.000000000

Lag 10 
               lpY      beta.1        Z.1.1        Z.1.2
lpY     0.27293386  0.32232116 -0.012964365  0.085824740
beta.1  0.32304469  0.87549029 -0.016190967  0.261043183
Z.1.1  -0.02550588 -0.02546508  0.736025398 -0.005903795
Z.1.2   0.09603661  0.26353799  0.004879753  0.768123924

plot of chunk unnamed-chunk-10plot of chunk unnamed-chunk-10

[[1]]

Quantile (q) = 0.025
Accuracy (r) = +/- 0.0125
Probability (s) = 0.95 

        Burn-in  Total Lower bound  Dependence
        (M)      (N)   (Nmin)       factor (I)
 lpY    30       6890  600          11.5      
 beta.1 130      21970 600          36.6      
 Z.1.1  100      18140 600          30.2      
 Z.1.2  270      52350 600          87.2      

Fitted positions and clusters:

plot(samp_fit2, pie=TRUE)

plot of chunk unnamed-chunk-11

Popularity Random Effect

We can add other terms to the model, for example a random effect capturing heterogeneous popularities of tie receivers. See the documentation for terms.ergmm for other possible model terms.

(samp_fit3 <- ergmm(samplike ~ euclidean(d=2, G=3)+rreceiver))
Fitted ERGMM model:
Exponential Random Graph Mixed Model definition
Formula: samplike ~ euclidean(d = 2, G = 3) + rreceiver
Family: Bernoulli 
Terms:
- fixed effects:
 - (Intercept) 
- latent space of 2 dimensions in 3 clusters
- random receiver effects
summary(samp_fit3)

==========================
Summary of model fit
==========================

Formula:   samplike ~ euclidean(d = 2, G = 3) + rreceiver
Attribute: edges
Model:     Bernoulli 
MCMC sample of size 4000, draws are 10 iterations apart, after burnin of 10000 iterations.
Covariate coefficients posterior means:
            Estimate   2.5%  97.5% 2*min(Pr(>0),Pr(<0))    
(Intercept)   2.0495 1.2101 2.9662            < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Receiver effect variance: 1.001218.
Overall BIC:        335.2027 
Likelihood BIC:     224.0979 
Latent space/clustering BIC:     73.73878 
Receiver effect BIC:     37.36593 

Covariate coefficients MKL:
            Estimate
(Intercept) 1.148547

Convergence diagnostics:

mcmc.diagnostics(samp_fit3)
Chain 1 
Lag 0 
                   lpY      beta.1         Z.1.1       Z.1.2    receiver.1
lpY         1.00000000  0.25164623 -0.0103108221 -0.05945303 -0.2379307505
beta.1      0.25164623  1.00000000  0.0154538069 -0.18103882 -0.2498164593
Z.1.1      -0.01031082  0.01545381  1.0000000000  0.01348806  0.0009458873
Z.1.2      -0.05945303 -0.18103882  0.0134880607  1.00000000 -0.0165133365
receiver.1 -0.23793075 -0.24981646  0.0009458873 -0.01651334  1.0000000000

Lag 10 
                   lpY      beta.1        Z.1.1         Z.1.2  receiver.1
lpY         0.30964946  0.16140865 -0.022500637 -0.0289526903 -0.12152890
beta.1      0.16866577  0.50908539  0.021493637 -0.0770662049 -0.21391357
Z.1.1      -0.01615094  0.03177671  0.833106110 -0.0005132335 -0.02047933
Z.1.2      -0.05070788 -0.08016389 -0.004604112  0.7749657837  0.01769491
receiver.1 -0.11176699 -0.15811914 -0.015626937  0.0009864863  0.32529743

plot of chunk unnamed-chunk-13plot of chunk unnamed-chunk-13plot of chunk unnamed-chunk-13

[[1]]

Quantile (q) = 0.025
Accuracy (r) = +/- 0.0125
Probability (s) = 0.95 

            Burn-in  Total Lower bound  Dependence
            (M)      (N)   (Nmin)       factor (I)
 lpY        30       7320  600          12.2      
 beta.1     50       8670  600          14.4      
 Z.1.1      240      55640 600          92.7      
 Z.1.2      120      27960 600          46.6      
 receiver.1 40       8480  600          14.1      

Fitted positions and clusters:

plot(samp_fit3, pie=TRUE)

plot of chunk unnamed-chunk-14

Choosing the Number of Clusters

In the examples above, we fixed the number of clusters (G=3). We can choose the number of clusters automatically with BICs:

# Fit a set of candidate models
fits<-list(
           ergmm(samplike~euclidean(d=2,G=1)),
           ergmm(samplike~euclidean(d=2,G=2)),
           ergmm(samplike~euclidean(d=2,G=3)),
           ergmm(samplike~euclidean(d=2,G=4))
           )


# Comptue the BICs
(bics<-reshape(
    as.data.frame(t(sapply(fits,
                           function(x)c(G=x$model$G,unlist(bic.ergmm(x))[c("Y","Z","overall")])))),
    list(c("Y","Z","overall")),idvar="G",v.names="BIC",timevar="Component",
    times=c("likelihood","clustering","overall"),direction="long"
    ))
             G  Component       BIC
1.likelihood 1 likelihood 252.16378
2.likelihood 2 likelihood 253.14269
3.likelihood 3 likelihood 253.45024
4.likelihood 4 likelihood 255.00933
1.clustering 1 clustering 127.87458
2.clustering 2 clustering 120.69210
3.clustering 3 clustering  84.55446
4.clustering 4 clustering  85.53359
1.overall    1    overall 380.03835
2.overall    2    overall 373.83480
3.overall    3    overall 338.00470
4.overall    4    overall 340.54293
# Plot BIC versus number of clusters
with(bics,interaction.plot(G,Component,BIC,type="b",xlab="Clusters", ylab="BIC"))

plot of chunk unnamed-chunk-15

# Summarize and plot whichever fit has the lowest overall BIC:
bestG<-with(bics[bics$Component=="overall",],G[which.min(BIC)])
summary(fits[[bestG]])

==========================
Summary of model fit
==========================

Formula:   samplike ~ euclidean(d = 2, G = 3)
Attribute: edges
Model:     Bernoulli 
MCMC sample of size 4000, draws are 10 iterations apart, after burnin of 10000 iterations.
Covariate coefficients posterior means:
            Estimate   2.5%  97.5% 2*min(Pr(>0),Pr(<0))    
(Intercept)   2.0153 1.3268 2.7959            < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Overall BIC:        338.0047 
Likelihood BIC:     253.4502 
Latent space/clustering BIC:     84.55446 

Covariate coefficients MKL:
            Estimate
(Intercept) 1.184587

Session information

sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  base     

other attached packages:
[1] latentnet_2.7.1      ergm_3.6.0           network_1.13.0       statnet.common_3.3.0
[5] knitr_1.12.3        

loaded via a namespace (and not attached):
 [1] codetools_0.2-14  mvtnorm_1.0-4     lattice_0.20-33   digest_0.6.8      MASS_7.3-45      
 [6] grid_3.2.3        formatR_1.1       magrittr_1.5      coda_0.18-1       evaluate_0.8     
[11] stringi_1.0-1     robustbase_0.92-3 Matrix_1.2-3      sna_2.3-2         trust_0.1-7      
[16] tools_3.2.3       stringr_1.0.0     DEoptimR_1.0-2    abind_1.4-3       parallel_3.2.3   
[21] lpSolve_5.6.13    methods_3.2.3