We are pleased to announce that Armeen Taeb, Assistant Professor of Statistics, has received a research grant award from the National Science Foundation for his proposal “False Discovery Control in Non-Standard Settings”

Abstract:

Controlling the false positive error in model selection is a prominent paradigm for gathering evidence in data-driven science. In model selection problems such as variable selection and graph estimation, models are characterized by an underlying Boolean structure, such as the presence or absence of a variable or an edge. Therefore, false positive error or false negative error can be conveniently specified as the number of variables/edges that are incorrectly included or excluded in an estimated model. However, the increasing complexity of modern datasets has been accompanied by the use of sophisticated modeling paradigms in which defining false positive error is a significant challenge. For example, models specified by structures such as partitions (for clustering), permutations (for ranking), directed acyclic graphs (for causal inference), or subspaces (for principal components analysis) are not characterized by a simple Boolean logical structure, which leads to difficulties with formalizing and controlling false positive error. A new perspective is needed to provide reliable inference in modern data analysis. The methods developed in this project have the potential to impact a wide range of fields as varied as image analysis, geosciences, computational genomics, and many others. The research will engage both graduate and undergraduate students and will be disseminated to a broader audience through the development of new courses. 
 
In this project, the PI develops a generic framework to organize classes of models as partially ordered sets (posets), which leads to systematic approaches for defining natural generalizations of false positive error and methodology for controlling this error. The project aims to use the poset framework to address the following questions: what attributes of the poset structure determine the power and computational complexities of false positive error controlling procedures? How can we exploit specific structures in posets to design powerful model selection methods? How do we provide false discovery rate guarantees over posets? Can we utilize the framework for learning rooted phylogenetic trees and performing highly correlated variable selection? 

Congratulations to Armeen on his success!