The National Science Foundation today announced a $12.5 million award to support the creation of the Institute for Foundations of Data Science (IFDS), a multi-site research institute led by the University of Washington, with partners at the Universities of Wisconsin-Madison, California Santa Cruz, and Chicago.  The award was one of two made public today in support of research institutes pursuing far-reaching and potentially transformative research at the frontiers of statistics, mathematics, electrical engineering, computer science.

At the UW, the project brings together researchers from four academic departments:  Electrical and Computer Engineering (ECE), Statistics, Mathematics and Computer Science and Engineering (CSE).  Zaid Harchaoui, Associate Professor of Statistics and one of the co-PIs in the award, noted that “the research program of IFDS addresses new challenging problems in learning and inference that yet echo classical fundamental results in computational statistics, robust statistics, statistical inference, and decision theory, established in the Department of Statistics at University of Washington, by D. Martin, M. Perlman, A. Raftery, W. Stuetzle, and J. Wellner. The team adopts a neoclassical viewpoint to approach these new problems in order to define new notions of optimality, robustness, calibration, relevant for modern day data science.” 

The IFDS award will help the team build up new pillars for data science to address modern scientific and technological challenges. The common assumptions of data science, which underlie most of the widely used theories, methods, algorithms, and software, are currently challenged and call for new foundations. The team will join forces in order to create new theoretical frameworks, methodologies, algorithmic schemes, and software paradigms, to address problems with heterogeneous data, distributional shifting, nonconvex optimization. 

For example, the researchers say that the theoretical properties of deep neural networks are still not well understood because of lack of appropriate theoretical concepts, which is a major roadblock to their use in natural and social sciences  in which scientists would like to do statistical inference with a view towards causal inference. This is also especially important to address the rising concerns regarding the fairness and the interpretability of current methods in data science as the related technologies impact people’s lives in areas such as healthcare. 

“We need the expertise of all core TRIPODS disciplines to understand the mysteries and address the pitfalls of machine learning and artificial intelligence algorithms, and Statistics is a key pillar,” said Maryam Fazel, the lead principal investigator of IFDS and a Professor in the Department of Electrical & Computer Engineering at University of Washington.

This award builds on the success of the UW’s TRIPODS Phase I award, which was co-led by Sham Kakade (Statistics and CSE) and Prof. Maryam Fazel (ECE), as well as the TRIPODS+X:Res award on robot learning, led by Zaid Harchaoui (Statistics). The beating heart of these TRIPODS Phase I projects is the synergy between statistical learning and mathematical optimization. Prof. Harchaoui has made important contributions to this emerging area, through his works on stochastic optimization algorithms for large datasets [1,2,3] and statistical decision theory in high dimensional spaces [4,5], leading to robot learning and marine ecology applications. 

IFDS fundamental advances will be immediately disseminated to the data science community through workshops, summer schools and hackathons. The diverse leadership, committed to equity and inclusion, has developed extensive plans for outreach to traditionally underrepresented groups. 

“The work of the Institute will drive a paradigm shift in the way information is extracted from data, as well as in the way that information is used to make decisions.” said Abel Rodriguez, Professor and Chair of the Department of Statistics. Professor Rodriguez, one of the co-PIs at the University of California Santa Cruz site and the diversity liaison in IFDS’ governing board, recently moved to the UW.

References
[1] <https://www.jmlr.org/papers/v18/17-748.html>
[2] <https://doi.org/10.1137/17M1125157>
[3] <https://papers.nips.cc/paper/7726-a-smoother-way-to-train-structured-prediction-models>
[4] <https://projecteuclid.org/euclid.aos/1558425639>
[5] <https://www.jmlr.org/papers/v20/16-155.html