Learning in Latent Variable Models
Latent variable models are ubiquitous in many areas of statistics. In this talk, we consider two problems which involve inferring properties of a latent distribution where the only observed data are binary outcomes. We firstly consider the deconvolution problem in the semiparametric Rasch model. Item response theory typically involves the noisy measurement of some underlying latent trait using discrete testing questions. Classically many of these latent variable models are assumed to be parametric, however recently these assumptions have been relaxed to allow nonparametric latent traits. When completely relaxing the class of latent measures it is known that for any finite set of questions t, this latent trait will be nonidentifiable, however, we seek to illustrate as both the number of individuals n, and the number of questions increase how to illustrate consistency of the nonparametric maximum likelihood estimate (NPMLE). We establish minimax optimality in 1-Wasserstein distance to the true latent distribution in different growth regimes of n vs. t. Additionally we illustrate how the nonparametric maximum likelihood procedure can be used to establish goodness-of-fit tests and apply this to a dataset of responses from the National Alzheimer’s Coordinating Center.
In the second project, we consider the network latent distance model. We introduce a novel method for estimating the curvature of the underlying manifold which is not restricted to a constant global curvature. This method, inspired by Toponogov’s Theorem, allows for identification of the curvature of the latent space using only 4 points, provided one is a midpoint of two of the others. Additionally we present theoretical results on the minimal distance from any point to the midpoint of two other independently sampled points. We illustrate good statistical properties such as consistency and asymptotic normality of the curvature estimator which can later be used to determine whether a distance matrix could have been generated by a manifold of constant curvature. We also present a novel distance estimator for the underlying latent space. We apply this method to a set of co-authorship networks in physics and find evidence of non-constant curvature, and also to the Los Alamos National Laboratory Netflow dataset and illustrate how estimated curvature can detect a red team attack more effectively at a given alarm rate than simple graph motifs.