Learning and Inference with Limited Data: Applications in Ecology and Causal Inference
General Exam presented by Medha AgarwalI present two projects that address complementary challenges in statistical machine learning under missing and incomplete data. Both projects explore how to make inferences when information is limited.
The first project develops an end-to-end modeling approach for predicting wildlife behavior from sparsely labeled ecological sensor data. A central aim of ecological research is to understand how environmental changes shape wildlife behavior and conservation outcomes. This increasingly relies on biologging data (time-stamped sensor recordings that capture an animal’s movement and environment) but such data is inherently noisy and only sparsely labeled. Using a decade of biologging data from one of Botswana’s most endangered predatory species, the African wild dog, the approach I propose integrates dataset curation, representation learning, behavior prediction, and uncertainty quantification via conformal prediction. The resulting model achieves high predictive accuracy and enables the examination of how environmental conditions drive shifts in behavioral patterns over time.
The second project addresses, from a theoretical perspective, an incomplete-data problem in causal inference, where only one potential outcome is observed per unit while others remain unobserved. I propose a distribution treatment effect (DTE) that, unlike the classical estimands like the average treatment effect, quantifies differences over entire potential outcome distributions. Building on the non-parametric counterfactual mean embedding (CME) framework introduced by Muandet et al. (2021), which embeds potential outcome distributions in Hilbert spaces, I introduce a DTE based on the Sinkhorn divergence between counterfactual distributions. I construct a debiased estimator of this divergence by establishing its second-order pathwise differentiability, which allows me to obtain an asymptotically valid hypothesis test for treatment effect. The proposed test has potential applications in off-policy evaluation and in assessing treatment effects in vaccine trials.