# Statistics for Social Data

## Course staff

- Instructor
- Prof. Perry (pperry@stern.nyu.edu)
- Office Hours: Thursday, 2:00PM-3:50PM, KMC 8-63

## Lecture slides

- Case Study: The Federalist Papers (Rmd)
- Segmenting Text (Rmd)
- Part-of-Speech Tagging (Rmd)
- Matrix Decompositions (Rmd)
- Topic Models (Rmd)
- Sentiment Analysis (Rmd)
- Exponential Random Graph Models (Rmd)
- Latent Space Network Models (Rmd)
- Network Clustering (Community Detection) (Rmd)

## Handouts and assignments

## Readings

- Perry and Wolfe (2013), Point process modelling for directed interaction networks.
- Aral and Walker (2012), Identifying influential and susceptible members of social networks.
- Newman (2006), Finding community structure in networks using the eigenvalues of matrices.
- Bickel and Chen (2009), A nonparametric view of network models and Newman-Girvan and other modularities.
- Handcock et al. (2007), Model-based clustering for social networks.
- Robins et al. (2007), An introduction to exponential random graph (p-star) models for social networks.
- Robins et al. (2007), Recent developments in exponential random graph (p-star) models for social networks.
- Liu (2011), Opinion Mining and Sentiment Analysis. Chapter 11 from Web Data Mining.
- Chen, Lin, and Zhou (2015). Statistical decision making for optimal budget allocation in crowd labeling.
- Řehůřek (2014). Word2vec tutorial.
- Mikolov, Sutskever, Chen, Corrado, and Dean (2013). Distributed representations of words and phrases and their compositionality.
- Levy and Goldberg (2014). Neural word embedding as implicit matrix factorization.
- Blei, Ng, and Jordan (2003). Latent dirichlet allocation.
- Lee and Seung (1999). Learning the parts of objects by non-negative matrix factorization.
- Kolda and O'Leary (1998). A Semidiscrete matrix decomposition for latent semantic indexing in information retrieval.
- Manning, Raghavan, and Schütze (2008). Matrix Decompositions and Latent Semantic Indexing. Chapter 18 from Introduction to Information Retrieval.
- Honnibal (2013). A Good Part-of-Speech Tagger in about 200 Lines of Python.
- Gimpel, Schneider, O'Connor, Das, Mills, Eisenstein, Heilman, Yogatama, Flanigan, and Smith (2011). Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments.
- Jurafsky and Martin (2015). Speeach and Language Processing (3rd ed. draft):
- Sliusarenko and Dyomkin (2014). How to split sentences.
- Read, Dridan, Oepen, and Solberg (2012). Sentence boundary detection: A long solved problem?
- Kiss and Strunk (2006). Unsupervised multilingual sentence boundary detection.
- Mosteller and Wallace (1963). Inference in an authorship problem.