The Folded Concave Laplacian Spectral Penalty Learns Block Diagonal Sparsity Patterns with the Strong Oracle Property
Structured sparsity is an important part of the modern statistical toolkit. A large body of research has shown that sparsity inducing penalties such as the group or fused Lasso (and their concave generalizations) can learn structured sparsity patterns with appealing computational and statistical guarantees. We say a set of model parameters has block diagonal sparsity up to permutations if its elements can be viewed as the edges of a graph and this graph has multiple connected components. For example, a block diagonal correlation matrix with K blocks of variables can be viewed as a graph with K connected components whose nodes are the variables and whose (real valued) edges are the correlations. This type of sparsity captures clusters of model parameters. To learn block diagonal sparsity patterns we develop the folded concave Laplacian spectral penalty and show that this penalty comes with appealing computational and statistical guarantees. While this penalty is non-convex, it is majorized by a (convex) weighted Lasso, thus a stationary point can be computed using the local linear approximation (LLA) algorithm. Under mild conditions (e.g. the existence of a good enough initial estimate) the LLA algorithm converges exactly to the oracle estimator after two steps with high probability, even in high-dimensional settings. The theory is then shown to apply to several classical problems including: covariance estimation, linear regression, and logistic regression.