Body

Single nucleotide polymorphisms (SNPs) are changes in the genome when one base pair is substituted for another. Methods have been proposed using SNPs to study the heritability of phenotypic traits, using both likelihood based approaches and methods of moments approaches. It has been shown that estimates of heritability produced by these methods can be influenced by linkage disequilibrium (LD) among the SNPs. Although some proposed approaches aim to account for LD between SNPs, questions remain in the literature regarding which estimators are most robust to LD. In the first project, we investigate both different structures and different strengths of LD through simulation, thereby characterizing the performance of these estimators under different forms of LD. We derive approximate theoretical biases that result from LD, and we establish the approximate equivalence of some current methods.

The second project looks beyond sequence information, focusing on additional characteristics of the genome, such as chromatin accessibility, that can provide important information about cell types. Single cell ATAC-seq (scATAC-seq) now enables the measurement of chromatin accessibility in single cells, but analyzing these data is challenging due to sparsity and high dimensionality. Recent methods have used latent topic models such as latent Dirichlet allocation (LDA) to study scATAC-seq data. In the second project, we present a method that uses a prior in LDA to incorporate information from existing reference data, with the aim of improving inference in a newly acquired target data set. We show that our method successfully transfers topic semantics from the reference data set onto the target data set, and we provide evidence that our method can improve clusters in the target data set relative to clusters produced using an uninformative prior.