Collecting social network data is notoriously difficult, meaning that indirectly observed or missing observations are very common. In this talk, we address two of such scenarios: inference on network measures without network observations and inference of regression coefficients when actors in the network have latent block memberships.

Network data is often expensive to collect because it requires soliciting connections between all members of the population. Collecting aggregate relational data (ARD), responses to questions of the form “How many of your social connections have trait k,” is much more cost effective. In the first part of the talk, we show that we can use ARD to estimate individual and global network properties. Building on the latent surface model proposed by McCormick and Zheng (2015), we connect ARD to a network formation model, which allows us to obtain draws from the posterior distribution over graphs given the ARD response vector. We can then compute network statistics based on these posterior samples. We demonstrate our method using evidence from simulation and replicating results from empirical field experiments where the complete graph was observed. We establish consistency results on estimated model formulation parameters, network statistics, and coefficients of regression where estimated network measure is the outcome variable or independent variable.  

In the second part of the talk, we discuss how we make inference on coefficients where the outcome of a linear regression is the interaction between a pair of nodes and there are unobserved blocks of nodes. Building on exchangeable errors proposed by Marrs, McCormick, and Fosdick (2017), we propose block-exchangeable errors and a two-step procedure for estimation of the standard errors. We show, using simulations and proof, that block-exchangeable errors is preferable to exchangeable errors when latent blocks and observed covariates are dependent. In addition, we develop an algorithm to obtain block-exchangeable estimator when observations are zero left-censored, as real network data is likely not fully connected.