In this talk I will present contributions to representation learning for partitioning problems. Classical statistical learning methods such as kernel-based methods assume the specification of a function class (or collection thereof) the function to be learned is assumed to live in. This requirement translates into the specification of a mapping of the data into a feature space via a so-called feature representation. The statistical performance of a kernel-based method can, however, be impeded if this feature representation is improperly specified.

As larger datasets become available one can contemplate tackling both representation learning problems and partitioning problems such as cluster analysis or change-point estimation. When large labeled datasets can indeed be collected and used, I shall adopt an end-to-end learning approach to these questions, learning all parameters jointly, using gradient-based optimization algorithms. I will show that using my approach one can learn similarity measures from data for similarity-based discriminative clustering methods.

In the main part of the talk, I will present contributions to cluster analysis and change-point estimation within a unified framework allowing one to incorporate any quantity of labeled data. For cluster analysis, I will introduce an approach to learn the feature representation for the purpose of clustering, assuming that the feature representation is a compositional mapping amenable to automatic differentiation with respect to its parameters. The approach can leverage partially labeled data during the training process. For change-point estimation, I present an approach in the same spirit, similarly leveraging partially-annotated time series of data with labeled change points. Numerical results on simulated and real data demonstrate the interest of the approach.

In the final part of the talk, I will describe a change-point estimation method for data consisting of time series of point clouds, that is, discrete probability measures. I will connect the method to the concept of distances between probability distributions. This observation allows one to define an appropriate reproducing kernel to tackle time series with thousands of point clouds, each with thousands of points. This work is motivated by an oceanographic application in which flow cytometry point cloud data is continuously collected on phytoplankton during research cruises. I will illustrate the utility of the proposed method on a flow cytometry dataset and the potential to estimate the number of change points using auxiliary data. This serves as a critical step in understanding how phytoplankton community structure varies over space and time.