# Latent Space Network Models

Patrick O. Perry, NYU Stern School of Business

Most of the code here is taken from examples from the latentnet R package by the following authors: Pavel N. Krivitsky, Mark S. Handcock, Susan M. Shortreed, Jeremy Tantrum, Peter D. Hoff, Li Wang, and Kirk Li.

## Preliminaries

### Computing environment

We will use the following R packages.

``````library("latentnet")
``````

To ensure consistent runs, we set the seed before performing any analysis.

``````set.seed(0)
``````

### Data

We will analyze Sampson's monk network.

``````data(sampson)
``````

The previous command loads a symmetric network called `samplike`, with edges indicating monk affinities for each other:

``````samplike
``````
`````` Network attributes:
vertices = 18
directed = TRUE
hyper = FALSE
loops = FALSE
multiple = FALSE
total edges= 88
missing edges= 0
non-missing edges= 88

Vertex attribute names:
cloisterville group vertex.names

Edge attribute names:
nominations
``````

## Latent Position Models

In a latent position model, each node has an associated latent position. Notes with nearby latent positions are likely to form ties. To fit such a model, we need to specify the distance metric, and the dimension. In the example below, we use `euclidean` distance, with `d=2` dimensions.

``````(samp_fit <- ergmm(samplike ~ euclidean(d=2)))
``````
``````Fitted ERGMM model:
Exponential Random Graph Mixed Model definition
Formula: samplike ~ euclidean(d = 2)
Family: Bernoulli
Terms:
- fixed effects:
- (Intercept)
- latent space of 2 dimensions
``````

For more details of the fit, use the `summary` command:

``````summary(samp_fit)
``````
``````
==========================
Summary of model fit
==========================

Formula:   samplike ~ euclidean(d = 2)
Attribute: edges
Model:     Bernoulli
MCMC sample of size 4000, draws are 10 iterations apart, after burnin of 10000 iterations.
Covariate coefficients posterior means:
Estimate   2.5%  97.5% 2*min(Pr(>0),Pr(<0))
(Intercept)   2.2078 1.3769 3.1185            < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Overall BIC:        381.9872
Likelihood BIC:     252.1252
Latent space/clustering BIC:     129.862

Covariate coefficients MKL:
Estimate
(Intercept) 1.163472
``````

We can check the MCMC convergence diagnositics:

``````mcmc.diagnostics(samp_fit)
``````
``````Chain 1
Lag 0
lpY    beta.1      Z.1.1      Z.1.2
lpY    1.0000000 0.4742584  0.1678678  0.1115915
beta.1 0.4742584 1.0000000  0.2070105  0.2009421
Z.1.1  0.1678678 0.2070105  1.0000000 -0.2905797
Z.1.2  0.1115915 0.2009421 -0.2905797  1.0000000

Lag 10
lpY     beta.1       Z.1.1       Z.1.2
lpY    0.26575884 0.20645925  0.07526682  0.02212651
beta.1 0.21367821 0.22889886  0.08717512  0.03684092
Z.1.1  0.05245539 0.06800761  0.45962472 -0.18533520
Z.1.2  0.02585762 0.01131847 -0.18378066  0.36227548
``````

``````[[1]]

Quantile (q) = 0.025
Accuracy (r) = +/- 0.0125
Probability (s) = 0.95

Burn-in  Total Lower bound  Dependence
(M)      (N)   (Nmin)       factor (I)
lpY    30       7170  600          12.0
beta.1 40       7490  600          12.5
Z.1.1  40       7800  600          13.0
Z.1.2  40       7800  600          13.0
``````

We can also use the estimated latent positions to plot the network.

``````plot(samp_fit)
``````

## Model-Based Clustering Models

We can extend the Using Sampson's Monk data, lets fit a latent clustering random effects model

``````(samp_fit2 <- ergmm(samplike ~ euclidean(d=2, G=3)))
``````
``````Fitted ERGMM model:
Exponential Random Graph Mixed Model definition
Formula: samplike ~ euclidean(d = 2, G = 3)
Family: Bernoulli
Terms:
- fixed effects:
- (Intercept)
- latent space of 2 dimensions in 3 clusters
``````
``````summary(samp_fit2)
``````
``````
==========================
Summary of model fit
==========================

Formula:   samplike ~ euclidean(d = 2, G = 3)
Attribute: edges
Model:     Bernoulli
MCMC sample of size 4000, draws are 10 iterations apart, after burnin of 10000 iterations.
Covariate coefficients posterior means:
Estimate   2.5%  97.5% 2*min(Pr(>0),Pr(<0))
(Intercept)   2.0113 1.3297 2.7751            < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Overall BIC:        337.7598
Likelihood BIC:     253.5318
Latent space/clustering BIC:     84.22803

Covariate coefficients MKL:
Estimate
(Intercept)  1.18777
``````

Convergence diagnostics:

``````mcmc.diagnostics(samp_fit2)
``````
``````Chain 1
Lag 0
lpY      beta.1        Z.1.1       Z.1.2
lpY    1.00000000  0.33940312  0.011056095 0.097391329
beta.1 0.33940312  1.00000000 -0.032049123 0.290736714
Z.1.1  0.01105610 -0.03204912  1.000000000 0.008744742
Z.1.2  0.09739133  0.29073671  0.008744742 1.000000000

Lag 10
lpY      beta.1        Z.1.1        Z.1.2
lpY     0.27293386  0.32232116 -0.012964365  0.085824740
beta.1  0.32304469  0.87549029 -0.016190967  0.261043183
Z.1.1  -0.02550588 -0.02546508  0.736025398 -0.005903795
Z.1.2   0.09603661  0.26353799  0.004879753  0.768123924
``````

``````[[1]]

Quantile (q) = 0.025
Accuracy (r) = +/- 0.0125
Probability (s) = 0.95

Burn-in  Total Lower bound  Dependence
(M)      (N)   (Nmin)       factor (I)
lpY    30       6890  600          11.5
beta.1 130      21970 600          36.6
Z.1.1  100      18140 600          30.2
Z.1.2  270      52350 600          87.2
``````

Fitted positions and clusters:

``````plot(samp_fit2, pie=TRUE)
``````

## Popularity Random Effect

We can add other terms to the model, for example a random effect capturing heterogeneous popularities of tie receivers. See the documentation for `terms.ergmm` for other possible model terms.

``````(samp_fit3 <- ergmm(samplike ~ euclidean(d=2, G=3)+rreceiver))
``````
``````Fitted ERGMM model:
Exponential Random Graph Mixed Model definition
Formula: samplike ~ euclidean(d = 2, G = 3) + rreceiver
Family: Bernoulli
Terms:
- fixed effects:
- (Intercept)
- latent space of 2 dimensions in 3 clusters
``````
``````summary(samp_fit3)
``````
``````
==========================
Summary of model fit
==========================

Formula:   samplike ~ euclidean(d = 2, G = 3) + rreceiver
Attribute: edges
Model:     Bernoulli
MCMC sample of size 4000, draws are 10 iterations apart, after burnin of 10000 iterations.
Covariate coefficients posterior means:
Estimate   2.5%  97.5% 2*min(Pr(>0),Pr(<0))
(Intercept)   2.0495 1.2101 2.9662            < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Overall BIC:        335.2027
Likelihood BIC:     224.0979
Latent space/clustering BIC:     73.73878

Covariate coefficients MKL:
Estimate
(Intercept) 1.148547
``````

Convergence diagnostics:

``````mcmc.diagnostics(samp_fit3)
``````
``````Chain 1
Lag 0
lpY         1.00000000  0.25164623 -0.0103108221 -0.05945303 -0.2379307505
beta.1      0.25164623  1.00000000  0.0154538069 -0.18103882 -0.2498164593
Z.1.1      -0.01031082  0.01545381  1.0000000000  0.01348806  0.0009458873
Z.1.2      -0.05945303 -0.18103882  0.0134880607  1.00000000 -0.0165133365
receiver.1 -0.23793075 -0.24981646  0.0009458873 -0.01651334  1.0000000000

Lag 10
lpY         0.30964946  0.16140865 -0.022500637 -0.0289526903 -0.12152890
beta.1      0.16866577  0.50908539  0.021493637 -0.0770662049 -0.21391357
Z.1.1      -0.01615094  0.03177671  0.833106110 -0.0005132335 -0.02047933
Z.1.2      -0.05070788 -0.08016389 -0.004604112  0.7749657837  0.01769491
receiver.1 -0.11176699 -0.15811914 -0.015626937  0.0009864863  0.32529743
``````

``````[[1]]

Quantile (q) = 0.025
Accuracy (r) = +/- 0.0125
Probability (s) = 0.95

Burn-in  Total Lower bound  Dependence
(M)      (N)   (Nmin)       factor (I)
lpY        30       7320  600          12.2
beta.1     50       8670  600          14.4
Z.1.1      240      55640 600          92.7
Z.1.2      120      27960 600          46.6
``````

Fitted positions and clusters:

``````plot(samp_fit3, pie=TRUE)
``````

## Choosing the Number of Clusters

In the examples above, we fixed the number of clusters (`G=3`). We can choose the number of clusters automatically with BICs:

``````# Fit a set of candidate models
fits<-list(
ergmm(samplike~euclidean(d=2,G=1)),
ergmm(samplike~euclidean(d=2,G=2)),
ergmm(samplike~euclidean(d=2,G=3)),
ergmm(samplike~euclidean(d=2,G=4))
)

# Comptue the BICs
(bics<-reshape(
as.data.frame(t(sapply(fits,
function(x)c(G=x\$model\$G,unlist(bic.ergmm(x))[c("Y","Z","overall")])))),
list(c("Y","Z","overall")),idvar="G",v.names="BIC",timevar="Component",
times=c("likelihood","clustering","overall"),direction="long"
))
``````
``````             G  Component       BIC
1.likelihood 1 likelihood 252.16378
2.likelihood 2 likelihood 253.14269
3.likelihood 3 likelihood 253.45024
4.likelihood 4 likelihood 255.00933
1.clustering 1 clustering 127.87458
2.clustering 2 clustering 120.69210
3.clustering 3 clustering  84.55446
4.clustering 4 clustering  85.53359
1.overall    1    overall 380.03835
2.overall    2    overall 373.83480
3.overall    3    overall 338.00470
4.overall    4    overall 340.54293
``````
``````# Plot BIC versus number of clusters
with(bics,interaction.plot(G,Component,BIC,type="b",xlab="Clusters", ylab="BIC"))
``````

``````# Summarize and plot whichever fit has the lowest overall BIC:
bestG<-with(bics[bics\$Component=="overall",],G[which.min(BIC)])
summary(fits[[bestG]])
``````
``````
==========================
Summary of model fit
==========================

Formula:   samplike ~ euclidean(d = 2, G = 3)
Attribute: edges
Model:     Bernoulli
MCMC sample of size 4000, draws are 10 iterations apart, after burnin of 10000 iterations.
Covariate coefficients posterior means:
Estimate   2.5%  97.5% 2*min(Pr(>0),Pr(<0))
(Intercept)   2.0153 1.3268 2.7959            < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Overall BIC:        338.0047
Likelihood BIC:     253.4502
Latent space/clustering BIC:     84.55446

Covariate coefficients MKL:
Estimate
(Intercept) 1.184587
``````

## Session information

``````sessionInfo()
``````
``````R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  base

other attached packages:
[1] latentnet_2.7.1      ergm_3.6.0           network_1.13.0       statnet.common_3.3.0
[5] knitr_1.12.3

loaded via a namespace (and not attached):
[1] codetools_0.2-14  mvtnorm_1.0-4     lattice_0.20-33   digest_0.6.8      MASS_7.3-45
[6] grid_3.2.3        formatR_1.1       magrittr_1.5      coda_0.18-1       evaluate_0.8
[11] stringi_1.0-1     robustbase_0.92-3 Matrix_1.2-3      sna_2.3-2         trust_0.1-7
[16] tools_3.2.3       stringr_1.0.0     DEoptimR_1.0-2    abind_1.4-3       parallel_3.2.3
[21] lpSolve_5.6.13    methods_3.2.3
``````