# Microsoft

# Multiplicative Models for Register Data

Registers are increasingly important sources of data to be analyzed. Examples include registers of congenital abnormalities, supermarket purchases, or traffic violations. In such registers, records are created when a relevant event is observed, and they contain the features characterizing the event. Understanding the structure of associations among the features is of primary interest. However, the registers often do not contain cases in which no feature is present and therefore, standard multiplicative or log-linear models may not be applicable.

# Instrumental Variable Learning of Marginal Structural Models

In a seminal paper, Robins (1998) introduced marginal structural models (MSMs), a general class of counterfactual models for the joint effects of time-varying treatment regimes in complex longitudinal studies subject to time-varying confounding. He established identification of MSM parameters under a sequential randomization assumption (SRA), which rules out unmeasured confounding of treatment assignment over time.

# A New Standard for the Analysis and Design of Replication Studies

A new standard is proposed for the evidential assessment of replication studies. The approach combines a specific reverse-Bayes technique with prior-predictive tail probabilities to define replication success. The method gives rise to a quantitative measure for replication success, called the sceptical p-value. The sceptical p-value integrates traditional significance of both the original and replication study with a comparison of the respective effect sizes.

# Spectral Gap in Random Bipartite Biregular Graphs and Applications

The asymptotics of the second-largest eigenvalue in random regular graphs (also referred to as the "Alon conjecture") have been computed by Joel Friedman in his celebrated 2004 paper. Recently, a new proof of this result has been given by Charles Bordenave, using the non-backtracking operator and the Ihara-Bass formula. In the same spirit, we have been able to translate Bordenave's ideas to bipartite biregular graphs in order to calculate the asymptotical value of the second-largest pair of eigenvalues, and obtained a similar spectral gap result.

# Variational Analysis of Empirical Risk Minimization

This talk presents a variational framework for the asymptotic analysis of empirical risk minimization in general settings. In its most general form the framework concerns a two-stage inference procedure. In the first stage of the procedure, an average loss criterion is used to fit the trajectory of an observed dynamical system with a trajectory of a reference dynamical system. In the second stage of the procedure, a parameter estimate is obtained from the optimal trajectory of the reference system.

# Fast Inference for Spatial Generalized Linear Mixed Models

Non-Gaussian spatial data arise in a number of disciplines. Examples include spatial data on disease incidences (counts), and satellite images of ice sheets (presence-absence). Spatial generalized linear mixed models (SGLMMs), which build on latent Gaussian processes or Markov random fields, are convenient and flexible models for such data and are used widely in mainstream statistics and other disciplines. For high-dimensional data, SGLMMs present significant computational challenges due to the large number of dependent spatial random effects.

# Sequential change-point detection for a network of Hawkes processes

Hawkes processes has been a popular point process model for capturing mutual excitation of discrete events. In the network setting, this can capture the mutual influence between nodes, which has a wide range of applications in neural science, social networks, and crime data analysis. In this talk, I will present a statistical change-point detection framework to detect in real-time, a change in the influence using streaming discrete events.

# Exploring dynamic complex systems using time-varying networks

Extracting knowledge and providing insights into the complex

mechanisms underlying noisy high-dimensional data sets is of utmost

importance in many scientific domains. Networks are an example of

simple, yet powerful tools for capturing relationships among entities

over time. For example, in social media, networks represent

connections between different individuals and the type of interaction

that two individuals have. In systems biology, networks can represent

the complex regulatory circuitry that controls cell behavior.

# Estimation of a Two-component Mixture Model with Applications to Multiple Testing

We consider estimation and inference in a two component mixture model where the distribution of one component is completely unknown. We develop methods for estimating the mixing proportion and the unknown distribution nonparametrically, given i.i.d. data from the mixture model. We use ideas from shape restricted function estimation and develop "tuning parameter free" estimators that are easily implementable and have good finite sample performance. We establish the consistency of our procedures.

# Stability

Reproducibility is imperative for any scientific discovery. More often than not, modern scientific findings rely on statistical analysis of high-dimensional data. At a minimum, reproducibility manifests itself in stability of statistical results relative to "reasonable" perturbations to data and to the model used. Jacknife, bootstrap, and cross-validation are based on perturbations to data, while robust statistics methods deal with perturbations to models. In this talk, a case is made for the importance of stability in statistics.

# Your Dreams May Come True with MTP2

We study maximum likelihood estimation for exponential families that are multivariate totally positive of order two (MTP2). Such distributions appear in the context of ferromagnetism in the Ising model and various latent models, as for example Brownian motion tree models used in phylogenetics. We show that maximum likelihood estimation for MTP2 exponential families is a convex optimization problem. For quadratic exponential families such as Ising models and Gaussian graphical models, we show that MTP2 implies sparsity of the underlying graph without the need of a tuning parameter.

# Learning and Estimation: Separated at Birth, Reunited at Last

We consider the problem of regression in three scenarios: (a) random design under the assumption that the model F is well-specified, (b) distribution-free statistical learning with respect to a reference class F; and (c) online regression with no assumption on the generative process. The first problem is often studied in the literature on nonparametric estimation, the second falls within the purview of statistical learning theory, and the third is studied within the online learning community.

# Random Networks, Graphical models, and Exchangeability

We study conditional independence relationships for random networks and their interplay with exchangeability. We show that, for finitely exchangeable network models, the empirical subgraph densities are maximum likelihood estimates of their theoretical counterparts. We then characterize all possible Markov structures for finitely exchangeable random graphs, thereby identifying a new class of Markov network models corresponding to bidirected Kneser graphs.

# Accelerating Exact MCMC with Subsets of Data

One of the challenges of building statistical models for large data sets is balancing the correctness of inference procedures against computational realities. In the context of Bayesian procedures, the pain of such computations has been particularly acute as it has appeared that algorithms such as Markov chain Monte Carlo necessarily need to touch all of the data at each iteration in order to arrive at a correct answer. Several recent proposals have been made to use subsets (or "minibatches") of data to perform MCMC in ways analogous to stochastic gradient descent.

# Generalized Fiducial Inference: A Review

R. A. Fisher, the father of modern statistics, proposed the idea of fiducial inference in the 1930's. While his proposal led to some interesting methods for quantifying uncertainty, other prominent statisticians of the time did not accept Fisher's approach because it went against the ideas of statistical inference of the time.

# Adaptive Piecewise Polynomial Estimation via Trend Filtering

We discuss trend filtering, a recently proposed tool of Kim et al. (2009) for nonparametric regression. The trend filtering estimate is defined as the minimizer of a penalized least squares criterion, in which the penalty term sums the absolute kth order discrete derivatives over the input points.

# Exploring the Structure of Networks and Communities

Networks are all around us: social networks allow for information and influence flow through society, viruses become epidemics by spreading through networks, and networks of neurons allow us think and function. With the recent technological advances and the development of online social media we can study networks that were once essentially invisible to us. In this talk we discuss how computational perspectives and machine learning models can be developed to abstract networked phenomena like: How will a community or a social network evolve in the future?

# Probabilistic Topic Models of Text and Users

Probabilistic topic models provide a suite of tools for analyzing large document collections. Topic modeling algorithms discover the latent themes that underlie the documents and identify how each document exhibits those themes. Topic modeling can be used to help explore, summarize, and form predictions about documents.

# Bayesian Models for Integrative Genomics

Novel methodological questions are being generated in the biological sciences, requiring the integration of different concepts, methods, tools and data types. Bayesian methods that employ variable selection have been particularly successful for genomic applications, as they allow to handle situations where the amount of measured variables can be much greater than the number of observations. In this talk I will focus on models that integrate experimental data from different platforms together with prior knowledge.

# Controlling False Discovery Rate Via Knockoffs

In many fields of science, we observe a response variable together with a large number of potential explanatory variables, and would like to be able to discover which variables are associated with the response, while controlling the false discovery rate (FDR) to ensure that our results are reliable and replicable. The knockoff filter is a variable selection procedure for linear regression, proven to control FDR exactly under any type of correlation structure in the regime where n>p (sample size > number of variables).

# Statistical Methods for Ambulance Fleet Management

We introduce statistical methods to address two forecasting problems arising in the management of ambulance fleets: (1) predicting the time it takes an ambulance to drive to the scene of an emergency; and (2) space-time forecasting of ambulance demand. These predictions are used for deciding how many ambulances should be deployed at a given time and where they should be stationed, which ambulance should be dispatched to an emergency, and whether and how to schedule ambulances for non-urgent patient transfers.

# PDW Methods for Support Recovery in Nonconvex High-Dimensional Problems

The primal-dual witness (PDW) technique is a now-standard proof strategy for establishing variable selection consistency for sparse high-dimensional estimation problems when the objective function and regularizer are convex. The method proceeds by optimizing the objective function over the parameter space restricted to the true support of the unknown vector, then using a dual witness to certify that the resulting solution is also a global optimum of the unrestricted problem.

# Flexible, Reliable, and Scalable Nonparametric Learning

Applications of statistical machine learning increasingly involve datasets with rich hierarchical, temporal, spatial, or relational structure. Bayesian nonparametric models offer the promise of effective learning from big datasets, but standard inference algorithms often fail in subtle and hard-to-diagnose ways. We explore this issue via variants of a popular and general model family, the hierarchical Dirichlet process.

# Why Do Statisticians Treat Predictors as Fixed? -- A Conspiracy Theory

Regression models form the workhorse of much of statistical practice. A curious aspect of regression is that statisticians tend to treat the predictors as fixed quantities. In a majority of applications, however, the data are observational and hence the predictors are as random as the response. The division of the variables into predictors and response is therefore a human decision that is pragmatic as opposed to necessary -- unless it is justified by a very strong causal theory.

# Optimal Design of Experiments in the Presence of Network Interference

Causal inference research in statistics has been largely concerned with estimating the effect of treatment (e.g. personalized tutoring) on outcomes (e.g., test scores) under the assumption of "lack of interference"; that is, the assumption that the outcome of an individual does not depend on the treatment assigned to others. Moreover, whenever its relevance is acknowledged (e.g., study groups), interference is typically dealt with as an uninteresting source of variation in the data.

# Reproducibility of Science: P-values and Multiplicity

Published scientific findings seem to be increasingly failing efforts at replication. This is undoubtedly due to many sources, including specifics of individual scientific cultures and overall scientific biases such as publication bias. While these will be briefly discussed, the talk will focus on the all-too-common misuse of p-values and failure to properly account for multiplicities as two likely major contributors to the lack of reproducibility. The Bayesian approaches to both testing and multiplicity will be highlighted as possible general solutions to the problem.

# From Data to Decisions

I will present on directions with harnessing predictive models to guide decision making. I will first discuss methods for using machine learning to ideally couple human and computational effort, focusing on several illustrative efforts, including spoken dialog systems and citizen science. Then I will turn to challenges with healthcare and describe work to field statistical models in real-world clinical settings, focusing on the opportunity to join predictions about outcomes with utility models to guide intervention.

# Nonparametric Graphical Model: Foundation and Trends

We consider the problem of learning the structure of a non-Gaussian graphical model. We introduce two strategies for constructing tractable nonparametric graphical model families. One approach is through semiparametric extension of the Gaussian or exponential family graphical models that allows arbitrary graphs. Another approach is to restrict the family of allowed graphs to be acyclic, enabling the use of fully nonparametric density estimation in high dimensions.

# Cosmology in the Era of Big Data: Understanding Our Universe a Bit at a Time

With the development of new detectors, telescopes and computational facilities, astrophysics has entered an era of data intensive science. During the last decade, astronomers have surveyed the sky across many decades of the electromagnetic spectrum, collecting hundreds of terabytes of astronomical images for hundreds of millions of sources. Over the next decade, data volumes will reach tens of petabytes, and provide accurate measurements for billions of sources.

# Machine Teaching

For many ML problems, labeled data is readily available. The algorithm is the bottleneck. This is the ML researcherâ€™s paradise! Problems that have fairly stable distributions and can accumulate large quantities of human labels over time have this property: Vision, Speech, Autonomous driving. Problems that have shifting distribution and an infinite supply of labels through history are blessed in the same way: click prediction, data analytics, forecasting. We call these problems the â€œheadâ€ of ML.

# Additive and Interaction Models for Nonparametric Functional and Object, Regression, with Application to Opthalmological Multi-level Functional Data on Spherical Domains

Glaucoma is an ocular condition involving damage to the optic nerve that can lead to blindness if not treated. Its complete etiology is not known, but it is hypothesized that biomechanics in regions of the sclera most adjacent to the optic nerve head under interocular pressure (IOP) may play a role. It is proposed that as people age, their scleral surface may lose elasticity, causing less displacement in response to IOP and thus increased pressure affecting the optic nerve head (ONH).

# Micro-Randomized Trials and Statistical Methods for Mobile Health

Micro-randomized trials are trials in which individuals are randomized 100's or 1000's of times over the course of the study. The goal of these trials is to assess the impact of "in-the-moment" interventions, e.g. interventions that are intended to impact behavior over small time intervals. We discuss the design and analysis of these types of trials with a focus on their use in developing mobile health interventions.