Leveraging network information to improve population size estimation in social and environmental applications
Final Exam presented bySocial and ecological processes contain network structures such as interpersonal relationships and the flow of water through a river system. This dissertation develops methods for using such network information to improve population size estimates in both social (Chapter 2) and ecological (Chapters 3-4) contexts.
The first project considers the problem of estimating the size of a human subpopulation that is hard to reach through traditional survey methods, and contributes a framework for studying network scale-up method (NSUM) estimator performance when certain modeling assumptions are violated. A cost-effective approach to estimating the size or prevalence of a subpopulation that is hard to reach through a traditional survey, the NSUM makes several strong assumptions, including the random mixing assumption that any two people are equally likely to know each other. The basic NSUM involves two steps: estimating respondents’ degrees or the number of people they know, then using these estimated degrees along with the number of people they report knowing in the hard-to-reach subpopulation of interest to estimate the prevalence of that subpopulation. Each of these two steps involves taking either an average of ratios or a ratio of averages, and using the ratio of averages for each step has been the most common approach. However, we developed theoretical arguments that using the average of ratios at the second, prevalence-estimation step often has lower mean squared error when the random mixing assumption is violated, which seems likely in practice; this estimator was proposed early in NSUM’s development but has largely been unexplored and unused. Simulation results using an example network data set also supported these findings. On the basis of this theoretical and empirical evidence, we suggest that future surveys using a simple estimator may want to use this mixed estimator, and estimation methods based on this estimator may produce new improvements. This joint work with Ian Laga, Xiaoyue Niu, and Tyler H. McCormick is published in Sociological Methodology.
The second project develops a class of scalable spatial stream network (S3N) models to do estimation, inference and prediction with spatial processes on stream networks on a spatial scale that was previously not feasible. Spatial process models are a standard approach to making regional estimates based on point observations, but classically they account only for covariance based on birds’ eye distance, and they are not scalable to large regions due to their computational complexity. Existing spatial stream network (SSN) models adapt such spatial processes to river networks by incorporating valid stream covariance functions, but preprocessing and estimation with these models is expensive and precludes the analysis of regions at the multi-state and national level in the United States. Our contribution is a scalable spatial stream network (S3N) model based on the SSN that uses nearest neighbor approximations and more efficient preprocessing to enable national and regional spatial process modeling on stream networks. We demonstrate the accuracy and computational efficiency of the S3N models relative to SSNs on simulated data on the Ohio River Basin stream network. This is joint work with Julian Olden and Tyler H. McCormick.
The third project applies the S3N models developed in the second project to obtain what is to our knowledge the first set of fish population size estimates for over 300 species across the entire Ohio River Basin. Estimation on this scale was previously not possible, and the approach we demonstrate can be used to estimate freshwater fish populations by species over large regions. These estimates represent a critical step for biodiversity monitoring and conservation planning, as the geographic distribution of freshwater fish species at a national scale is currently unknown. Our publicly available code makes national and regional fish population size estimation accessible to the wider research community. This is joint work with Julian Olden and Tyler H. McCormick.