Estimating subnational health and demographic indicators using complex survey data
Final Exam presented by Peter GaoSubnational estimates of health and demographic indicators such as immunization coverage rates and child mortality rates are critical for identifying regional health disparities and guiding policy design. When population data on an outcome of interest are unavailable or incomplete, many countries gather information from a sample of the population using household surveys. These surveys are typically designed for producing national estimates of key indicators, but generally do not collect sufficient data to produce reliable subnational estimates using traditional direct estimation methods, especially when estimating the prevalence of rare events. In this setting, indirect methods that use statistical models to incorporate covariate information or smooth estimates across areas using random effects can be effective for generating more precise estimates. However, national statistical offices and policymakers commonly desire estimators that are robust to model misspecification, making careful selection of methods crucial for producing estimates that are acceptable for dissemination and decision making. In recent years, geostatistical models which treat quantities of interest as continuous spatial surfaces have become popular among global health researchers for mapping key health indicators, especially for low- and middle- income countries. These approaches often compensate for limited data availability by leveraging advances in spatial modeling and incorporating newly available covariate information derived from satellite imaging, but may fail to account for features of the complex surveys used to collect data, such as informative sampling or cluster effects, potentially leading to biased estimates. On the other hand, traditional small area estimation approaches common in the survey statistics literature are typically specified with careful consideration for survey design, but have historically been adopted in countries where high-quality census data on auxiliary covariates are available and may perform suboptimally in low data settings.
In this thesis, I propose a suite of methods for estimating subnational health and demographic indicators using complex survey data. First, I propose an area level model for demographic rates that jointly models the direct estimators and associated variance estimators and induces spatial smoothing of both means and variances. This method can be viewed as an extension of the Fay-Herriot model popular for small area estimation that is adapted for estimation of small area proportions. Second, I outline a smoothed model-assisted estimator for small area means that incorporates unit level covariate information and smoothing via random effects while accounting for the survey design via the use of survey weights. Finally, I describe a method for incorporating sampling weights when estimating unit level models in order to address the effects of clustering and informative sampling.