Graphs for Statistical Learning and Modeling
Graph, consisting of a set of vertices and a set of edges, is a geometric object that can not only visualize but also mathematical characterize the geometric structures in data. Graphs also model relations or connections between different units and have applications in various fields such as epidemiology, sociology, biology, and chemistry. We first take advantage of graphs from a geometric perspective. We propose a data analysis framework that constructs weighted graphs, called skeletons, to encode the geometric structures in data and utilizes the learned geometric information to assist the downstream analysis tasks such as clustering and regression.
For clustering, we introduce a density-aided method called Skeleton Clustering that can detect clusters in multivariate and even high-dimensional data with irregular shapes. To bypass the curse of dimensionality, we propose surrogate density measures that are less dependent on the dimension but have intuitive geometric interpretations. The clustering framework constructs a concise representation of the given data as an intermediate step and can be thought of as a combination of prototype methods, density-based clustering, and hierarchical clustering. We show by theoretical analysis and empirical studies that skeleton clustering leads to reliable clusters in multivariate and high-dimensional scenarios.
For regression tasks, we propose a novel framework specialized for covariates concentrated around some low-dimension geometric structures. The proposed framework first learns a graph representation of the covariates, which we call the skeleton, to summarize the geometric structures. Then we apply nonparametric regression techniques to estimate the regression function on the skeleton, which, notably, bypasses the curse of dimensionality. We derive statistical and computational properties of the proposed regression framework and use simulations and real data examples to illustrate its effectiveness. Our framework has the advantage that predictors from distinct geometric structures can be accounted for and is robust to additive noise and noisy observations.
Graph, as a structure to represent connections, is a helpful tool in modeling contact networks, which is incorporated in various epidemic models. However, missing links in the observed contact network are inevitable, which raises concern over the robustness of epidemic models in this regard. To address this concern, we assess epidemic models under missingness and present some preliminary results from this ongoing project.