Preferential sampling and model checking in phylodynamic inference
Estimating population size fluctuations is one of the key tasks in Ecology. However, traditional sampling based approaches to perform this task have limitations when populations of interest are extinct or are hard to reach, as is the case for individuals infected for a short time period by a pathogen. Phylodynamics combines coalescent theory from population genetics and statistical modeling to estimate fluctuations of effective population size --- an idealized quantity that can be mapped to census population size with additional demographical information --- from molecular sequences of individuals sampled from a population of interest. However, when analyzing sequences sampled serially through time, many methods implicitly assume that sampling times do not depend on the size of the effective population size. We show that, when sampling times do probabilistically depend on effective population size, estimation methods that do not account for this dependence may be systematically biased. We propose a model that accommodates preferentially sampled data by modeling the distribution of sampling times as an inhomogeneous Poisson process dependent on effective population size via a log-linear intensity function. We proceed to extend our model to be able to include additional time-varying covariates into the intensity function. Via simulations and via recent influenza and Ebola datasets, we demonstrate that our model not only reduces bias, but also improves estimation precision. Finally, we propose and implement a posterior predictive diagnostic method to check the adequacy of the coalescent and sampling time models.