In this talk, we consider two instances of sampling bias in estimating population size.

In the first project, we evaluate the robustness of simple network scale-up estimators under violations of the random mixing assumption. The network scale-up method (NSUM) is a cost-effective approach to estimate the prevalence of a subpopulation H that is hard to reach through a standard survey. The basic NSUM involves two steps: estimating respondents' degrees by scaling up how many people respondents report knowing in groups of known size, and estimating the prevalence of H by scaling up the proportion of people they know that are in H. Each of these two steps involves taking an average of ratios or a ratio of averages, leading to four possible estimators. Using the ratio of averages for each step has so far been the most common approach, but the estimators have not to our knowledge been systematically evaluated. We present theoretical and empirical evidence that using the average of ratios at the second, prevalence-estimation step often has lower mean squared error when the random mixing assumption is violated, which happens frequently in practice. We suggest that future surveys that use a simple estimator may consider using this mixed estimator, and estimation methods based on this estimator may produce new improvements.

In the second project, we build upon recent statistical models for the spatial distribution of freshwater fish. In the last twenty years, Ver Hoef, Peterson and others have developed statistical models with covariance functions that depend on the distance along directed river networks instead of Euclidean distance. We investigate scaling these models to estimate fish populations over much larger regions, and a major challenge stems from sampling bias. We discuss our progress, ideas, and next steps.