Analysis of Sparse Contingency Tables: Monte Carlo Estimation of Exact P-Values
Published 12/1/1989
Abstract
The traditional practice in the analysis of contingency table has been one of the goodness-of-fit tests (e.g. Pearson's X 2 or the likelihood ratio 0 2) which lean heavily on asymptotic machinery. It is well known that these methods can give spurious results when the table is sparse or when the sample size of the table is small. Although various algorithms have been proposed for exact tests in two-dimensional tables, exact tests in three- and higher-dimensional tables is almost an untouched area. In fact, even the minimal extension from two-dimension to three-dimension poses conceptually new problems. Viewing the Pvvalue as the expectation of a indicator function on a Markov chain whose equilibrium distribution is the same as the distribution of contingency tables with same margins, we propose a method to estimate the P-value by simulating the Markov chain which is constructed by the Metropolis algorithm. Probability functions for multidimensional contigency tables are develeped in the process. The method considerably extends the bounds of computational feasibility of the exact test for contingency table of virtually any dimension. It is fast and can be modified to calculate other statistics of the table. It also has the advantage that it can be to parallel processing. Numerical examples are given to illustrate the method.
KEYWORDS: Contingency tables, exact log-linear iU'-'U,"-i. Markov algorithm, Monte Carlo, probability function.