Body

I will discuss various topics at the intersection of machine learning, Bayesian methods, and the estimation of causal effects, focusing on the estimation of conditional average treatment effects (CATEs). I make the following claims:

1. Judicious, direct, regularization of the treatment effect heterogeneity is essential to get low-RMSE estimates of the CATE, and in high-noise settings this can be more important than specifying the functional form of the model correctly.

2. Bayesian decision tree ensembles with causally-informed priors, that both shrink towards homogeneous treatment effects and incorporate the propensity score, typically perform very well in this context relative to meta-learning approaches; in particular, estimation accuracy for CATEs is high and uncertainty quantification is conservative in the sense of being biased against finding non-existent heterogeneity.

3. Naive applications of Bayesian machine learning approaches typically lead to poor Frequentist performance. We observe that naive approaches tend to also be deficient from a subjective Bayesian perspective, in that they imply tightly-concentrated prior distributions on certain selection bias parameters that we actually wish to express ignorance about.

Our points are illustrated through analysis of both data-informed simulations and analysis of medical expenditure data.