Body

There is growing appreciation of the perils of naively using the same data for model selection and subsequent inference; such “double-dipping” is now frowned upon in many disciplines. Sample splitting has become the de facto solution, but it reflects only one possible solution to the challenge of choosing data-driven hypotheses for subsequent inferential investigation. Indeed, there are some cases, e.g., with dependent data or when using unsupervised methods like clustering, where it is not clear how to appropriately conduct sample splitting. The statistical sub-field of “selective inference” embraces this challenge with the goal of offering methods in this setting that are both valid and powerful. In this talk, I’ll discuss some alternatives to sample splitting drawn from recent and current statistical work. I’ll discuss approaches based on both conditioning and data thinning. The "conditional” approach adjusts for exploratory analyses by conditioning on the selection event. I’ll review some cases where this approach has been used successfully, but note that it requires the development of potentially technically challenging and bespoke methods. Another alternative to sample splitting is “data thinning,” which can enable greater selective flexibility and can be especially useful in settings where sample splitting is not readily applicable. I’ll illuminate both approaches by discussing some ongoing projects. This presentation will include joint work with Daniela Witten, Ethan Ancell, and Olivia McGough.