Coreset selection for the Sinkhorn divergence and generic smooth divergences
General Exam presented by Alex March KokotEither explicitly or implicitly, central to many problems in Statistics is the alignment of data distributions and functionals of interest. Typical examples include debiasing in Causal Inference, Bayesian analyses, model selection, dimension reduction, etc, with the goal of each of these procedures being either to uncover or exploit latent structure within a dataset.
One of the most transparent settings in which this perspective is applicable is coreset selection, the reduction of a dataset to a small collection of landmark points. In this talk, I frame this problem as risk minimization, with the goal of distribution approximation being formalized as minimizing a divergence of interest. We develop a framework that is applicable to adequately regular losses, relating our approach to a method of moments which we implement efficiently via low-rank matrix approximation in an algorithm we call CO2.
As an application of this method, we consider coreset selection for the Sinkhorn divergence, a smoothed version of K-means clustering. To apply this approach, we build on recent results in entropic optimal transport. By linearizing an appropriate parameterization of the Schrödinger system of equations, we verify Hadamard differentiability of key functionals such as entropic potentials in the Gaussian kernel RKHS weak topology. This allows us to achieve lossless Sinkhorn-compression with only poly-log many coreset points.
We highlight directions of future research, particularly in entropic optimal transport, where these proof techniques have promising application. Of particular interest, we discuss how the CO2 algorithm relates to spectral clustering, and our progress toward the verification of an asymptotic equivalence between these objectives.