Tree-Based Aggregation in Statistical Learning
Statistical Challenges and Opportunities in Two-sided Markets With application in Tech Business.
Black-box tests for algorithmic stability
Algorithmic stability is a concept from learning theory that expresses the degree to which changes to the input data (e.g., removal of a single data point) may affect the outputs of a regression algorithm. Knowing an algorithm's stability properties is often useful for many downstream applications---for example, stability is known to lead to desirable generalization properties and predictive inference guarantees.
Learning Attribute Hierarchies from Data: Exploratory Approaches
In cognitive diagnostic assessment in education, multiple fine-grained attributes are measured simultaneously. Attribute hierarchies are considered important structural features of cognitive diagnostic models (CDM) that provide useful information about the nature of attributes. Templin and Bradshaw (2014) first introduced a hierarchical diagnostic classification model (HDCM) that directly takes into account attribute hierarchies, and hence HDCM is nested within more general CDMs.
Optimal SLOPE Power and False Positives Trade-off
SLOPE is a relatively new convex optimization procedure for high-dimensional linear regression via the sorted l1 penalty: the larger the rank of the fitted coefficient, the larger the penalty. This non-separable penalty renders many existing techniques invalid or inconclusive in analyzing the SLOPE solution. In this talk, I demonstrate an asymptotically exact characterization of the SLOPE solution under Gaussian random designs through solving the SLOPE problem using approximate message passing (AMP).
Optimizing Healthcare Interventions and Incentives: Models and Insights for Depression and Influenza
Improving healthcare interventions and incentives have received much attention due to excessive costs and poor quality of care. The goal is to identify optimal strategies to improve population health outcomes, while accounting for patients’ heterogeneity in disease progression and population dynamics. I present two applications under U.S. healthcare settings.
Uncover Hidden Fine-Grained Scientific Information: Structured Latent Attribute Models
In modern psychological and biomedical research with diagnostic purposes, scientists often formulate the key task as inferring the fine-grained latent information under structural constraints. These structural constraints usually come from the domain experts’ prior knowledge or insight. The emerging family of Structured Latent Attribute Models (SLAMs) accommodate these modeling needs and have received substantial attention in psychology, education, and epidemiology. SLAMs bring exciting opportunities and unique challenges.
Hidden Markov Model Architectures for the Analysis of Ecological and Environmental Data
Hidden Markov models (HMMs) are a popular class of models used in the analysis of sequential data, particularly time series data. A discrete-time, finite-state HMM is a doubly stochastic process composed of a state process, S, and a state-dependent observation process Y, with the observations taken to be conditionally independent given the states. The state process is assumed to be generated according to an underlying Markov chain, while the observations are generated according to a set of state-dependent distributions.
Data Science Methods to Reduce Inequality and Improve Healthcare
I will describe how to use data science methods to understand and reduce inequality in two domains: criminal justice and healthcare. First, I will discuss how to use Bayesian modeling to detect racial discrimination in policing. Second, I will describe how to use machine learning to explain racial and socioeconomic inequality in pain.
Hierarchical Priors for Bayesian Model Selection in Informative and Non-Informative Settings
Prior elicitation is a foundational problem in Bayesian statistics, particularly in the context of hypothesis testing and model selection. On one end of the spectrum, it is well known that standard “non-informative” priors used for parameter estimation in contexts where little prior information is available can lead to ill-defined or inconsistent Bayes factors. On the other end, ignoring structural information available in specific problems can lead to procedures with suboptimal (frequentist) properties.
Graphical Modeling of Local Independence in Dynamical Systems
Local independence is an asymmetric notion of independence which describes how a system of stochastic processes (e.g. point processes or diffusions) evolves over time. Let A, B, and C be three subsets of the coordinate processes of the stochastic system. Intuitively speaking, B is locally independent of A given C if at every point in time knowing the past of both A and C is not more informative about the present of B than knowing the past of C only.
Center-Outward R-Estimation for Semiparametric VARMA Models
We propose a new class of estimators for semiparametric VARMA models with unspecified innovation density. Our estimators are based on the measure transportation-based concepts of multivariate center-outward ranks and signs. Root-n consistency and asymptotic normality are obtained under a broad class of innovation densities including, e.g., multimodal mixtures of Gaussians.
Adaptive Experimental Design for Multiple Testing and Best Identification
Adaptive experimental design (AED), or active learning, leverages already-collected data to guide future measurements, in a closed loop, to collect the most informative data for the learning problem at hand. In both theory and practice, AED can extract considerably richer insights than any measurement plan fixed in advance, using the same statistical budget.
Consistent Weighted Sampling, Min-Max Kernel, and Connections to Computing with Big Data
In this talk, I will introduce the ideas of min-max similarity (which can be viewed as a type of non-linear kernel) and consistent weighted sampling (CWS). These topics might be relatively new to the statistics community. In a paper in 2015, I demonstrated the surprisingly superb performance of min-max similarity in the context of kernel classification, compared to the standard linear or Gaussian kernels.
High-Dimensional Principal Component Analysis with Heterogeneous Missingness
We study the problem of high-dimensional Principal Component Analysis (PCA) with missing observations. In simple, homogeneous missingness settings with a noise level of constant order, we show that an existing inverse-probability weighted (IPW) estimator of the leading principal components can (nearly) attain the minimax optimal rate of convergence.
Diversity, Equity, and Inclusion Department Climate Study Results
The UW Center for Evaluation and Research for STEM Equity (CERSE) conducted a climate survey and focus groups in the Statistics Department over the past calendar year that involved students, faculty, and staff. In this seminar, several of CERSE's Research Scientists will share the results of the climate study. The results fall under the following themes: Inclusion; Community; Communication; Career & Advising Support. CERSE will conclude with some recommendations moving forward to foster an increasingly equitable departmental climate.
Robust Estimation: Optimal Rates, Computation and Adaptation
I will discuss the problem of statistical estimation with contaminated data. In the first part of the talk, I will discuss depth-based approaches that achieve minimax rates in various problems. In general, the minimax rate of a given problem with contamination consists of two terms: the statistical complexity without contamination, and the contamination effect in the form of modulus of continuity. In the second part of the talk, I will discuss computational challenges of these depth-based estimators.
Modified Multidimensional Scaling
Classical multidimensional scaling is an important tool for data reduction in many applications. It takes in a distance matrix and outputs low-dimensional embedded samples such that the pairwise distances between the original data points can be preserved, when treating them as deterministic points. However, data are often noisy in practice. In such case, the quality of embedded samples produced by classical multidimensional scaling starts to break down, when either the ambient dimensionality or the noise variance gets larger.
[CANCELLED] Nonparametric Identified Methods to Handle Nonignorable Missing Data
Topological Data Analysis: Functional Summaries and Locating Cosmic Voids and Filament Loops
Data exhibiting complicated spatial structures are common in many areas of science (e.g. cosmology, biology), but can be difficult to analyze. Persistent homology is a popular approach within the area of Topological Data Analysis (TDA) that offers a way to represent, visualize, and interpret complex data by extracting topological features, which can be used to infer properties of the underlying structures. For example, TDA may be useful for analyzing the large-scale structure (LSS) of the Universe, which is an intricate and spatially complex web of matter.
Anchor Regression: Heterogeneous Data Meets Causality
Many traditional statistical prediction methods mainly deal with the problem of overfitting to the given data set. On the other hand, there is a vast literature on the estimation of causal parameters for prediction under interventions. However, both types of estimators can perform poorly when used for prediction on heterogeneous data. We show that the change in loss under certain perturbations (interventions) can be written as a convex penalty.
Graphical Characterizations of Adjustment Sets
Scientific research is often concerned with questions of cause and effect. For example, does eating processed meat cause certain types of cancer? Ideally, such questions are answered by randomized controlled experiments. However, these experiments can be costly, time-consuming, unethical or impossible to conduct. Hence, often the only available data to answer causal questions is observational.
When Your Big Data Seems Too Small
We discuss several problems related to the challenge of making accurate inferences about a complex phenomenon, given relatively little data. We show that for several fundamental and practically relevant settings, including estimating the intrinsic dimensionality of a high-dimensional distribution, and learning a population of distributions given few data points from each distribution, it is possible to ``denoise'' the empirical distribution significantly.
Searching for Missing Heritability: A Closer Look at Methodological Issues
Fundamental to the study of the inheritance is the partitioning of the total phenotypic variation into genetic and environmental components. Using twin studies, the phenotypic variance-covariance matrix can be parameterized to include an additive genetic effect, shared and non-shared environmental effects. The ratio of the genetic variance component to the total phenotypic variance is the proportion of genetically controlled variation and is termed as the ‘narrow-sense heritability’.
On Functional Principal Component Regression
Functional data analysis has been increasingly used in biomedical studies, where the basic unit of measurement is a function, curve, or image. For example, in mobile health (mHealth) studies, wearable sensors collect high-resolution trajectories of physiological and behavioral signals over time. Functional linear regression models are useful tools for quantifying the association between functional covariates and scalar/functional responses, where a popular approach is via functional principal component analysis.
Bayes Shrinkage at GWAS Scale: A Scalable Algorithm for the Horseshoe Prior with Theoretical Guarantees
The horseshoe prior is frequently employed in Bayesian analysis of high-dimensional models, and has been shown to achieve minimax optimal risk properties when the truth is sparse. While optimization-based algorithms for the extremely popular Lasso and elastic net procedures can scale to dimension in the hundreds of thousands, algorithms for the horseshoe that use Markov chain Monte Carlo (MCMC) for computation are limited to problems an order of magnitude smaller. This is due to high computational cost per step and poor mixing of existing MCMC algorithms.
Markov Random Fields, Geostatistics, and Matrix-Free Computation
Since their introduction in statistics through the seminal works of Julian Besag, Gaussian Markov random fields have become central to spatial statistics, with applications in agriculture, epidemiology, geology, image analysis and other areas of environmental science.
Variational Analysis of Empirical Risk Minimization
This talk presents a variational framework for the asymptotic analysis of empirical risk minimization in general settings. In its most general form the framework concerns a two-stage inference procedure. In the first stage of the procedure, an average loss criterion is used to fit the trajectory of an observed dynamical system with a trajectory of a reference dynamical system. In the second stage of the procedure, a parameter estimate is obtained from the optimal trajectory of the reference system.
Reproducibility is imperative for any scientific discovery. More often than not, modern scientific findings rely on statistical analysis of high-dimensional data. At a minimum, reproducibility manifests itself in stability of statistical results relative to "reasonable" perturbations to data and to the model used. Jacknife, bootstrap, and cross-validation are based on perturbations to data, while robust statistics methods deal with perturbations to models. In this talk, a case is made for the importance of stability in statistics.
Hidden Markov Models for Precipitation
Advisors: Peter Guttorp & Jim Hughes