Denny Hall

Denny Hall



Shape restrictions such as monotonicity in one or more dimensions sometimes naturally arise. The restriction can be effectively used for function estimation without smoothing. Several exciting results on function estimation under monotonicity, and to a lesser extent, under multivariate monotonicity have been obtained in the frequentist setting. But only a little is known about how Bayesian methods work when there are restrictions on the shape. Chakraborty and Ghosal recently studied the convergence properties of a "projection-posterior" distribution.


Coauthors: Michael Jansson and Kenichi Nagasawa


The Hawkes Processes is a popular type of self-exciting point process that has found application in the modeling of financial stock markets, earthquakes, and social media cascades. Their continuous time framework, however, necessitates that data collected for inference be accurate. However, for real-time monitors of data, for example in remote sensing or cybersecurity, accurate detection of events is challenging.

Recently, addressing “spatial confounding” has become a major topic in spatial statistics. However, the literature has provided conflicting definitions, and many proposed definitions do not address the issue of confounding as it is understood in causal inference.

The US Census Bureau will deliberately corrupt data sets derived from the 2020 US Census in an effort to maintain privacy, suggesting a painful trade-off between the privacy of respondents and the precision of economic analysis. To investigate whether this trade-off is inevitable, we formulate a semiparametric model of causal inference with high dimensional corrupted data. We propose a procedure for data cleaning, estimation, and inference with data cleaning-adjusted confidence intervals.

Randomized experiments allow for consistent estimation of the average treatment effect based on the difference in mean outcomes without strong modeling assumptions. Appropriate use of pretreatment covariates can further improve the estimation efficiency. Missingness in covariates is nevertheless common in practice and raises an important question: should we adjust for covariates subject to missingness, and if so, how? The unadjusted difference in means is always unbiased.

Quantifying treatment effect heterogeneity is a crucial task in many areas of causal inference, e.g. optimal treatment allocation and estimation of subgroup effects. We study the problem of estimating the level sets of the conditional average treatment effect (CATE), identified under the no-unmeasured-confounders assumption. Given a user-specified threshold, the goal is to estimate the set of all units for whom the treatment effect exceeds that threshold.

Emerging modern datasets in public health call for development of innovative statistical methods that can leverage complex real-world data settings. We first discuss a stochastic epidemic model that incorporates contact tracing data to make inference about transmission dynamics on an adaptive contact network. An efficient data-augmented inference scheme is designed to accommodate partially epidemic observations.

Reinforcement learning is a general technique that allows an agent to learn an optimal policy and interact with an environment in sequential decision making problems. The goodness of a policy is measured by its value function starting from some initial state. This talk includes a few topics about constructing statistical inference for a policy's value in infinite horizon settings where the number of decision points diverges to infinity. Applications in real world examples will also be discussed.

We introduce the localization schemes framework for analyzing the mixing time of Markov chains. Our framework unifies and extends the previous proof techniques via spectral independence framework by Anari, Liu and Oveis Gharan and the stochastic localization process used for proving high dimensional properties of log-concave measures.

Randomized control trials (RCTs) have been the gold standard to evaluate the effectiveness of a program, policy, or treatment on an outcome of interest. However, many RCTs assume that study participants are willing to share their (potentially sensitive) data, specifically their response to treatment. This assumption, while trivial at first, is becoming difficult to satisfy in the modern era, especially in online settings where there are more regulations to protect individuals' data.

The method of difference-in-differences (DID) is widely used to study the causal effect of policy interventions in observational studies. DID employs a before and after comparison of the treated and control units to remove bias due to time-invariant unmeasured confounders under the parallel trends assumption. Estimates from DID, however, will be biased if the outcomes for the treated and control units evolve differently in the absence of treatment, namely if the parallel trends assumption is violated.

With observational data alone, causal inference is a challenging problem. The task becomes easier when having access to data collected from perturbations of the underlying system, even when the nature of these is unknown. In this talk, we will describe methods that use such perturbation data to identify plausible causal mechanisms and to obtain robust predictions. Specifically, in the context of Gaussian linear structural equation models, we first characterize the interventional equivalence class of DAGs.

In contemporary applications, it is common to collect very large data sets with the vaguely-defined goal of hypothesis generation. Once a dataset is used to generate a hypothesis,  we might wish to test that hypothesis on the same set of data. However, this type of "double dipping" violates a cardinal rule of statistical hypothesis testing: namely, that we must decide what hypothesis to test before looking at the data.

Modern machine learning algorithms have achieved remarkable performance in a myriad of applications, and are increasingly used to make impactful decisions in the hiring process, criminal sentencing, healthcare diagnostics and even to make new scientific discoveries. The use of data-driven algorithms in high-stakes applications is exciting yet alarming: these methods are extremely complex, often brittle, notoriously hard to analyze and interpret.

Scientific research is often concerned with questions of cause and effect. For example, does eating processed meat cause certain types of cancer? Ideally, such questions are answered by randomized controlled experiments. However, these experiments can be costly, time-consuming, unethical or impossible to conduct. Hence, often the only available data to answer causal questions is observational.  

Change point detection is a popular tool for identifying locations in a data sequence where an abrupt change occurs in the data distribution and has been widely studied for Euclidean data. Modern data very often is non- Euclidean, for example distribution valued data or network data. Change point detection is a challenging problem when the underlying data space is a metric space where one does not have basic algebraic operations like addition of the data points and scalar multiplication. 

Advisor: Jon Wellner We consider the problem of forming confidence intervals and tests for the location of the mode in the setting of nonparametric estimation of a log-concave density. We thus study the class of log-concave densities with fixed and known mode. We find the maximum likelihood estimator for this class, give a characterization of it, and, under the null hypothesis, show our estimator is uniformly consistent and is $n^{2/5}$-tight at the mode. We also show uniqueness of the analogous limiting "estimator" of a quadratic function with white noise.

Advisor: Adrian E. Raftery