Fisher Inversion, Repro Samples Method, and Principled Random Forests
Rapid data science developments and the desire to have interpretable AI require us to have innovative frameworks to tackle frequently seen, but highly non-trivial inference problems, e.g., those involving discrete or non-numerical parameters and those involving non-numerical data, etc. This talk presents an effective and wide-reaching framework, called repro samples method, to conduct statistical inference for these problems and more. We develop both theories to support our development and provide effective computing algorithms for problems in which explicit solutions are unavailable. A particular development is on a commonly encountered non-trivial inference problem that involve both discrete/nonnumerical and continuous parameters. We propose an effective two-step procedure to make inferences for all parameters and use Fisher inversion method to develop a unique matching scheme that turns the disadvantage of lacking theoretical tools to handle discrete/nonnumerical parameters into an advantage of improving computational efficiency. The effectiveness of the method is illustrated through a case study example of developing a novel machine learning ensemble tree model, called principled random forests. Specifically, we first construct a confidence set for the underlying (‘true’) tree model that generated (or approximately generated) the observed data. We then obtain a tree ensemble model using the confidence set, from which we derive our inference. The development is principled and interpretable since, firstly, it is fully theoretically supported and provides frequentist performance guarantees on both inference and predictions; and secondly, the approach only assembles a small number of trees in the confidence set and thereby the model used is more interpretable. The development is further extended to handle tree-structured conditional average treatment effect in a causal inference setting. Numerical results demonstrate superior performance of our proposed approach than existing single and ensemble tree methods.
Fisher inversion and repro samples method can provide us a new toolset for developing interpretable AI and for helping address the blackbox issues in complex machine learning models. The development of the principled random forest is our first attempt on this direction.