Bayesian Modeling of a High Resolution Housing Price Index
Advisor: Emily Fox Understanding how housing values evolve over time is important to consumers, real estate professionals, and policy makers. Existing methods for constructing housing indices are computed at a coarse spatial granularity, such as metropolitan regions. This coarse granularity does not have the representative power to encode the fine price dynamics apparent in local markets, such as neighborhoods and census tracts, and therefore leads to distorted price predictions. A challenge in moving to estimates at, for example, the census tract level is the sparsity of spatiotemporally localized house sales observations. Our work addresses the data sparsity challenge by leveraging observations from multiple census tracts discovered to have correlated valuation dynamics. We propose a Bayesian nonparametric approach which builds on the framework of latent factor models to enable a flexible, data- driven method for inferring the clustering of correlated census tracts. We explore methods for scalability and parallelizability of computations, yielding a housing valuation index at the level of census tract rather than zip code, and on a monthly basis rather than quarterly. Our analysis is provided on a large Seattle metropolitan housing dataset, which includes all house sales record from 1997 to 2013. We further incorporate a non-stationary trend into our Bayesian framework to capture the global effect, jointly with the local dynamics. Our further work seeks to define the local neighborhood structure itself, rather than using pre-defined census tract regions. Instead of working with Euclidean distance, we propose an optimization based graph algorithm to discover neighborhoods of houses that have similar attributes and are closely connected by roads. Our discovered regions are at a finer scale than census tracts, and even in this case our methods described above produce a house index at this hyperlocal neighborhood level, with better predictive performance as compared to the index at the census tract level.