PCAR

PACCAR Hall

PACCAR Hall

We consider estimation of parameters defined as linear functionals of solutions to linear inverse problems. Any such parameter admits a doubly robust representation that depends on the solution to a dual linear inverse problem, where the dual solution can be thought as a generalization of the inverse propensity function.

Author: Yanqin Fan, Brendan Pass, and Xuetao Shi

The underlying probability measure of random objects, which are metric-space valued data, can be characterized by distance profiles that correspond to one-dimensional distributions of probability mass falling into balls of increasing radius under mild regularity conditions. 

I will discuss various topics at the intersection of machine learning, Bayesian methods, and the estimation of causal effects, focusing on the estimation of conditional average treatment effects (CATEs). I make the following claims:

1. Judicious, direct, regularization of the treatment effect heterogeneity is essential to get low-RMSE estimates of the CATE, and in high-noise settings this can be more important than specifying the functional form of the model correctly.

Advisor: Marina Meila

Energy minimization problems arise frequently when working with binary graph structures such as markov random fields (MRFs), commonly found in spatial statistics and image analysis. Connections between the energy minimization problem and network flow algorithms is well known and frequently used in optimizing these models. Under certain conditions -- namely, sub modularity these techniques, typically called graph cuts, allow for quickly finding the minimum energy state.

Deducing the state or structure of a system from partial, noisy measurements is a fundamental task throughout the sciences and engineering. The resulting inverse problems are often ill-posed because there are fewer measurements available than the ambient dimension of the model to be estimated. In practice, however, many interesting signals or models contain few degrees of freedom relative to their ambient dimension.

Deep learning and unsupervised feature learning offer the potential to transform many domains such as vision, speech, and natural language processing. However, these methods have been fundamentally limited by our computational abilities, and typically applied to small-sized problems. In this talk, I describe the key ideas that enabled scaling deep learning algorithms to train a large model on a cluster of 16,000 CPU cores (2000 machines).

We present an exploration of the rich theoretical connections between several classes of regularized models, network flows, and recent results in submodular function theory. This work unifies key aspects of these problems under a common theory, leading to novel methods for working with several important models of interest in statistics, machine learning and computer vision.

High-dimensional statistics is the basis for analyzing large and complex data sets that are generated by cutting-edge technologies in genetics, neuroscience, astronomy, and many other fields. However, Lasso, Ridge Regression, Graphical Lasso, and other standard methods in high-dimensional statistics depend on tuning parameters that are difficult to calibrate in practice. In this talk, I present two novel approaches to overcome this difficulty.

High-dimensional low-rank structure arises in many applications including genomics, signal processing, and social science. In this talk, we discuss some recent results on high-dimensional low-rank matrix recovery, including low-rank matrix recovery via rank-one projections and structured matrix completion. We provide theoretical justifications for the proposed methods and derive lower bounds for the estimation errors. The proposed estimators are shown to be rate-optimal under certain conditions.