Body

Modern data tend to present complex structures that challenge classical modeling assumptions and frameworks, including heterogeneity and spatial and/or temporal dependency. Bayesian nonparametric (BNP) models are a powerful tools that can address these challenges. They enable flexible modeling structures that adapt to the data complexity and provides uncertainty estimation. This dissertation proposes several BNP methods that are applicable to a wide range of statistical learning problems in regression, clustering and density estimation, with applications in fields including global health and financial econometrics. 

In the first project, we propose a novel model that integrates the Bayesian additive regression tree prior (BART) into the Gaussian process spatial model. Our model is aimed at spatial prediction problems where the covariate effects tend to be nonlinear and flexible. To improve the computational scalability of the model, we incorporated the Integrated Nested Laplace Approximation (INLA) method into the MCMC update routine. The model is studied through simulation examples and applied to anthropometric survey data from Kenya, where the data was collected from a stratified two-stage sampling design.

In the second project, we apply the dependent Dirichlet process (DDP) to model the temporal dynamics in the multivariate Hawkes process (MHP) models, a point process model used to capture mutually exciting temporal behaviors in temporal event sequences. Our model allows for flexible and adaptive modeling for excitation functions while borrowing information across dimensinos. Two computation methods including MCMC and stochastic variational inference are developed. The modeling behavior is studied via simulation studies, where our model outperform benchmarks in terms of estimation accuracy. The model is applied to study the order flow of high-frequency financial markets, and provided valuable insights to understand the excitation patterns in real order flow data.

In the third project, we explore a foundational issue in Bayesian mixture modeling: the ability to consistently estimate the number of mixture components when the kernel distribution is mis-specified. While Dirichlet process mixtures (DPMs) offer great flexibility for density estimation, they are inconsistent for component estimation. We first illustrate the issue through a series of parametric toy example via both theoretical and empirical results. We then present potential solutions in two approaches. We first propose a two-level hierarchical model combining mixture of finite mixtures (MFM) at the top level and DPMs at the bottom. We also explore prior elicitation and post-hoc component merging strategies.