On Using Predictive Models for Decisions
We often use predictive models to make a decision afterwards. For instance, we might estimate the number of patients at a medical clinic and then designate resources to serve those patients. The class of accurate predictive models might be quite large (called the "Rashomon effect" by Breiman), which leads to two observations, (i) there may be very accurate, yet very sparse logical models that are naturally useful for decision making, (ii) if the decision problem is coupled over a set of unlabeled points (like scheduling), there may be a large range of decisions resulting from the set of good predictive models.
Considering point (i), I will present Falling Rule Lists (Wang and Rudin, AISTATS 2015). This method is a competitor for CART (Classification and Regression Trees). It produces sparse logical models, which are ordered lists of IF-THEN rules, where the risks monotonically decrease as we go down the list (e.g., IF diabetes THEN risk=50%, ELSE IF hypertension THEN risk=40%, ELSE risk 30%). For medical applications, Falling Rule Lists predicts risk at the same time as it stratifies patients into decreasing risk categories - this makes it very natural to use for decision making, like a medical calculator.
To address (ii) I will present work on Machine Learning with Operational Costs (Tulabandhula and Rudin, Journal of Machine Learning Research, 2013). This paper considers decision problems that are coupled over a small set of unlabeled points. This work is based on the idea that managers have very practical knowledge about the cost of solving their decision problems. This prior knowledge leads to a reduction in the size of the hypothesis space, and better learning theoretic guarantees on the quality of the predictions.
Thus in the first part of the talk I will discuss how predictive modeling can be made to help with decision-making, and in the second part, I will discuss how prior knowledge about decisions can help with prediction. The first part uses Bayesian analysis, and the second part is a statistical learning theory result.