```{r setup, cache=FALSE, echo=FALSE, global.par=TRUE} library("knitr") # opts_chunk # terminal output options(width = 100) # code chunk options opts_chunk$set(cache=TRUE, fig.align="center", comment=NA, echo=TRUE, highlight=FALSE, tidy=FALSE, warning=FALSE, message=FALSE) ``` Latent Space Network Models =========================== *Patrick O. Perry, NYU Stern School of Business* Most of the code here is taken from examples from the [latentnet R package](https://cran.r-project.org/web/packages/latentnet/index.html) by the following authors: Pavel N. Krivitsky, Mark S. Handcock, Susan M. Shortreed, Jeremy Tantrum, Peter D. Hoff, Li Wang, and Kirk Li. Preliminaries ------------- ### Computing environment We will use the following R packages. ```{r} library("latentnet") ``` To ensure consistent runs, we set the seed before performing any analysis. ```{r} set.seed(0) ``` ### Data We will analyze Sampson's monk network. ```{r} data(sampson) ``` The previous command loads a symmetric network called `samplike`, with edges indicating monk affinities for each other: ```{r} samplike ``` Latent Position Models ---------------------- In a latent position model, each node has an associated latent position. Notes with nearby latent positions are likely to form ties. To fit such a model, we need to specify the distance metric, and the dimension. In the example below, we use `euclidean` distance, with `d=2` dimensions. ```{r} (samp_fit <- ergmm(samplike ~ euclidean(d=2))) ``` For more details of the fit, use the `summary` command: ```{r} summary(samp_fit) ``` We can check the MCMC convergence diagnositics: ```{r} mcmc.diagnostics(samp_fit) ``` We can also use the estimated latent positions to plot the network. ```{r} plot(samp_fit) ``` Model-Based Clustering Models ----------------------------- We can extend the Using Sampson's Monk data, lets fit a latent clustering random effects model ```{r} (samp_fit2 <- ergmm(samplike ~ euclidean(d=2, G=3))) summary(samp_fit2) ``` Convergence diagnostics: ```{r} mcmc.diagnostics(samp_fit2) ``` Fitted positions and clusters: ```{r} plot(samp_fit2, pie=TRUE) ``` Popularity Random Effect ------------------------ We can add other terms to the model, for example a random effect capturing heterogeneous popularities of tie receivers. See the documentation for `terms.ergmm` for other possible model terms. ```{r} (samp_fit3 <- ergmm(samplike ~ euclidean(d=2, G=3)+rreceiver)) summary(samp_fit3) ``` Convergence diagnostics: ```{r} mcmc.diagnostics(samp_fit3) ``` Fitted positions and clusters: ```{r} plot(samp_fit3, pie=TRUE) ``` Choosing the Number of Clusters ------------------------------- In the examples above, we fixed the number of clusters (`G=3`). We can choose the number of clusters automatically with BICs: ```{r} # Fit a set of candidate models fits<-list( ergmm(samplike~euclidean(d=2,G=1)), ergmm(samplike~euclidean(d=2,G=2)), ergmm(samplike~euclidean(d=2,G=3)), ergmm(samplike~euclidean(d=2,G=4)) ) # Comptue the BICs (bics<-reshape( as.data.frame(t(sapply(fits, function(x)c(G=x$model$G,unlist(bic.ergmm(x))[c("Y","Z","overall")])))), list(c("Y","Z","overall")),idvar="G",v.names="BIC",timevar="Component", times=c("likelihood","clustering","overall"),direction="long" )) # Plot BIC versus number of clusters with(bics,interaction.plot(G,Component,BIC,type="b",xlab="Clusters", ylab="BIC")) # Summarize and plot whichever fit has the lowest overall BIC: bestG<-with(bics[bics$Component=="overall",],G[which.min(BIC)]) summary(fits[[bestG]]) ``` Session information ------------------- ```{r} sessionInfo() ```