# DEN

Denny Hall

### Denny Hall

Reinforcement learning is a general technique that allows an agent to learn an optimal policy and interact with an environment in sequential decision making problems. The goodness of a policy is measured by its value function starting from some initial state. This talk includes a few topics about constructing statistical inference for a policy's value in infinite horizon settings where the number of decision points diverges to infinity. Applications in real world examples will also be discussed.

We introduce the localization schemes framework for analyzing the mixing time of Markov chains. Our framework unifies and extends the previous proof techniques via spectral independence framework by Anari, Liu and Oveis Gharan and the stochastic localization process used for proving high dimensional properties of log-concave measures.

Randomized control trials (RCTs) have been the gold standard to evaluate the effectiveness of a program, policy, or treatment on an outcome of interest. However, many RCTs assume that study participants are willing to share their (potentially sensitive) data, specifically their response to treatment. This assumption, while trivial at first, is becoming difficult to satisfy in the modern era, especially in online settings where there are more regulations to protect individuals' data.

The method of difference-in-differences (DID) is widely used to study the causal effect of policy interventions in observational studies. DID employs a before and after comparison of the treated and control units to remove bias due to time-invariant unmeasured confounders under the parallel trends assumption. Estimates from DID, however, will be biased if the outcomes for the treated and control units evolve differently in the absence of treatment, namely if the parallel trends assumption is violated.

With observational data alone, causal inference is a challenging problem. The task becomes easier when having access to data collected from perturbations of the underlying system, even when the nature of these is unknown. In this talk, we will describe methods that use such perturbation data to identify plausible causal mechanisms and to obtain robust predictions. Specifically, in the context of Gaussian linear structural equation models, we first characterize the interventional equivalence class of DAGs.

In contemporary applications, it is common to collect very large data sets with the vaguely-defined goal of hypothesis generation. Once a dataset is used to generate a hypothesis,  we might wish to test that hypothesis on the same set of data. However, this type of "double dipping" violates a cardinal rule of statistical hypothesis testing: namely, that we must decide what hypothesis to test before looking at the data.

Modern machine learning algorithms have achieved remarkable performance in a myriad of applications, and are increasingly used to make impactful decisions in the hiring process, criminal sentencing, healthcare diagnostics and even to make new scientific discoveries. The use of data-driven algorithms in high-stakes applications is exciting yet alarming: these methods are extremely complex, often brittle, notoriously hard to analyze and interpret.

Scientific research is often concerned with questions of cause and effect. For example, does eating processed meat cause certain types of cancer? Ideally, such questions are answered by randomized controlled experiments. However, these experiments can be costly, time-consuming, unethical or impossible to conduct. Hence, often the only available data to answer causal questions is observational.

Change point detection is a popular tool for identifying locations in a data sequence where an abrupt change occurs in the data distribution and has been widely studied for Euclidean data. Modern data very often is non- Euclidean, for example distribution valued data or network data. Change point detection is a challenging problem when the underlying data space is a metric space where one does not have basic algebraic operations like addition of the data points and scalar multiplication.

Advisor: Jon Wellner We consider the problem of forming confidence intervals and tests for the location of the mode in the setting of nonparametric estimation of a log-concave density. We thus study the class of log-concave densities with fixed and known mode. We find the maximum likelihood estimator for this class, give a characterization of it, and, under the null hypothesis, show our estimator is uniformly consistent and is $n^{2/5}$-tight at the mode. We also show uniqueness of the analogous limiting "estimator" of a quadratic function with white noise.