Body

Communities of microscopic organisms (microbiomes) play essential roles in all aspects of life, including (at global scales) the maintenance of biogeochemical cycles and sustenance of global food webs, and (at human scales) nutrient and energy acquisition, and immune system development and regulation. However, many off-the-shelf statistical tools are poorly suited for the analysis of modern microbiome data. In this dissertation, I develop three methods that each target different biologically-motivated microbial estimands. In my first project, I develop a method to compare estimated gene-level evolutionary histories to estimated genome-level evolutionary histories, with the goal of distilling large amounts of evolutionary information into a low-dimensional visualization. In my second and third projects, I consider the problem of differential abundance, that is, the identification of microbial categories that have significantly different abundances across environmental covariates. In my second project, I extend an existing differential abundance method to be computationally efficient for tens of thousands of microbial species. My method leads to near-identical conclusions as the original method while requiring a fraction of the computational time and memory. In my third project, I propose a model for abundances of molecular functions, and investigate parameter identifiability and estimation under model misspecification.