Husky Union Building (HUB)

Husky Union Building (HUB)

Registers are increasingly important sources of data to be analyzed. Examples include registers of congenital abnormalities, supermarket purchases, or traffic violations. In such registers, records are created when a relevant event is observed, and they contain the features characterizing the event. Understanding the structure of associations among the features is of primary interest. However, the registers often do not contain cases in which no feature is present and therefore, standard multiplicative or log-linear models may not be applicable.

We introduce the Essential Regression model, which provides an alternative to the ubiquitous K-sparse high dimensional linear regression on p variables. While K-sparse regression assumes that only K  components of the observable X  directly influence Y , Essential Regression allows for all components of X  to influence Y , but mediated through a K-dimensional random vector Z.

In a seminal paper, Robins (1998) introduced marginal structural models (MSMs), a general class of counterfactual models for the joint effects of time-varying treatment regimes in complex longitudinal studies subject to time-varying confounding. He established identification of MSM parameters under a sequential randomization assumption (SRA), which rules out unmeasured confounding of treatment assignment over time.

Note 2/7/2018: We are canceling this seminar as a precaution in anticipation of the expected Winter storm.


As the pace and scale of data collection continues to increase across all areas of biology, there is a growing need for effective and principled statistical methods for the analysis of the resulting data. In this talk, I'll describe two ongoing projects to help fill this gap. 

A new standard is proposed for the evidential assessment of replication studies. The approach combines a specific reverse-Bayes technique with prior-predictive tail probabilities to define replication success. The method gives rise to a quantitative measure for replication  success, called the sceptical p-value. The sceptical p-value integrates  traditional significance of both the original and replication study with a comparison of the respective effect sizes.

Did you know that your skills in statistics can be applied to ensure natural resources, such as fish, wildlife and even ecosystems, remain resilient into the future? That your love of algebra can take you to wild, remote, and amazing places? That there are careers where you get to collaborate with a wide variety of dedicated scientists working to better understand the world, how it is changing, and what it will be like in the future?

In many applications, investigators monitor processes that  vary in space and time, with the goal of identifying temporally persistent and spatially localized departures from a baseline or ``normal" behavior. In this talk, I will first discuss a principled Bayesian approach for estimating time varying functional connectivity networks from brain fMRI data. Dynamic functional connectivity, i.e., the study of how interactions among brain regions change dynamically over the course of an fMRI experiment, has recently received wide interest in the neuroimaging literature.

The asymptotics of the second-largest eigenvalue in random regular graphs (also referred to as the "Alon conjecture") have been computed by Joel Friedman in his celebrated 2004 paper. Recently, a new proof of this result has been given by Charles Bordenave, using the non-backtracking operator and the Ihara-Bass formula. In the same spirit, we have been able to translate Bordenave's ideas to bipartite biregular graphs in order to calculate the asymptotical value of the second-largest pair of eigenvalues, and obtained a similar spectral gap result.

Non-Gaussian spatial data arise in a number of disciplines. Examples include spatial data on disease incidences (counts), and satellite images of ice sheets (presence-absence). Spatial generalized linear mixed models (SGLMMs), which build on latent Gaussian processes or Markov random fields, are convenient and flexible models for such data and are used widely in mainstream statistics and other disciplines. For high-dimensional data, SGLMMs present significant computational challenges due to the large number of dependent spatial random effects.

Interested in what our graduate students have been working on? Come join us for posters and presentations by the students themselves as they present their research.

Volunteer presenters include:

Data science is at a crossroads. Each year, thousands of new data scientists are entering science and technology, after a broad training in a variety of fields. Modern data science is often exploratory in nature, with datasets being collected and dissected in an interactive manner. Classical guarantees that accompany many statistical methods are often invalidated by their non-standard interactive use, resulting in an underestimated risk of falsely discovering correlations or patterns.

Argo floats measure sea water temperature and salinity in the upper 2,000 m of the global ocean. The statistical analysis of the resulting spatio-temporal data set is challenging due to its nonstationary structure and large size. I propose mapping these data using locally stationary Gaussian process regression where covariance parameter estimation and spatio-temporal prediction are carried out in a moving-window fashion. This yields computationally tractable nonstationary anomaly fields without the need to explicitly model the nonstationary covariance structure.

Many important causal questions concern interactions between units, also known as interference. Examples include interactions between individuals in households, students in schools, and firms in markets. Standard analyses that ignore interference can often break down in this setting: estimators can be badly biased, while classical randomization tests can be invalid. In this talk, I present recent results on estimation and testing for two-stage experiments, which are powerful designs for assessing interference.

Paul Gustafson, Department of Statistics, University of British Columbia Hierarchical Bayesian Modelling for Survival Data Hierarchical Bayes models can be flexible tools for the analysis of failure time data. This will be illustrated by two examples. The first example is in a clinical trials context, when there are several response times for each patient, and many patients at each clinical centre. Frailties are used to model both across-patient variability and across-centre variability.

We study maximum likelihood estimation for exponential families that are multivariate totally positive of order two (MTP2). Such distributions appear in the context of ferromagnetism in the Ising model and various latent models, as for example Brownian motion tree models used in phylogenetics. We show that maximum likelihood estimation for MTP2 exponential families is a convex optimization problem. For quadratic exponential families such as Ising models and Gaussian graphical models, we show that MTP2 implies sparsity of the underlying graph without the need of a tuning parameter.


Rotational post hoc transformations have traditionally played a key role in enhancing the interpretability of factor analysis. Regularization methods also serve to achieve this goal by prioritizing sparse loading matrices. In this work, we bridge these two paradigms with a unifying Bayesian framework. Our approach deploys intermediate factor rotations throughout the learning process, greatly enhancing the effectiveness of sparsity inducing priors.


We discuss two recent results concerning disease modeling on networks. The infection is assumed to spread via contagion (e.g., transmission over the edges of an underlying network). In the first scenario, we observe the infection status of individuals at a particular time instance and the goal is to identify a confidence set of nodes that contain the source of the infection with high probability.