Inference and Estimation for Network Data
Networks play a key role in many scientific domains, yet collecting network data is often expensive and time-consuming. This thesis analyzes several estimation problems in network inference, especially in cases where only partial data about the network is available.
First, we consider the latent space model for network data and propose a method that estimates the latent type, dimension, and curvature of the latent space. We prove this method yields consistent estimates of the latent space properties in a wide variety of settings. The second problem we consider concerns network data collection. Collecting full network data is often prohibitively expensive and time-consuming. A common form of cheaper network data, known as Aggregated Relational Data (ARD), asks respondents ``How many people do you know with trait X?" for various pre-determined traits. In the second project, we show how to use ARD to estimate statistics of an unobserved network, such as the centrality of a node or network-level regression coefficients. We prove that this method yields consistent estimates of a broad class of network statistics under the assumption that the network is drawn from common network models, such as the stochastic block model or latent space model. This method allows researchers to analyze networks of interest without needing to collect full network data and therefore makes network data collection faster and cheaper. The third question we analyze concerns model selection for network data. Using the eigenvalues of the normalized adjacency matrix, we derive a testing procedure that allows researchers to select the most appropriate model from a collection of candidate models. We also show how this testing method is applicable to cases where the researcher only has access to ARD and not full network data.