High-dimensional Change-point Detection Using Generalized Homogeneity Metrics
Change-point detection is a classical problem in statistics and econometrics, and has been extensively studied in the fixed dimensional context and to detect changes in the mean or the covariance structure of a sequence of observations. This work focuses on the problem of detecting abrupt distributional changes in the data generating distribution (beyond the first two moments), which remains substantially less explored in the literature in the high-dimensional context. We develop a nonparametric methodology to detect an unknown number of change-points in an independent sequence of high-dimensional observations and to test for the significance of the estimated change-point locations. Our approach essentially rests upon nonparametric tests for the homogeneity of two high-dimensional distributions. We construct a single change-point location estimator via defining a cumulative sum process in an embedded Hilbert space. As the key theoretical innovation, we rigorously derive its limiting distribution under the high dimension medium sample size (HDMSS) framework. Subsequently we combine our statistic with the idea of wild binary segmentation to recursively estimate and test for multiple change-point locations. The superior performance of our methodology compared to other existing procedures is illustrated via both simulated and real datasets.