The Megaman project for manifold learning with millions of points, developed by Marina Meilă, James McQueen, Jake VanderPlas and Zhongyue Zhang, has been awarded millions of hours of CPU time on the Argonne Leadership Computing Facility (ALCF) as part of UW submission to the Data Science Program (ADSP) at Argonne National Laboratory.

Megaman is being utilized in the project "Constructing and Navigating Polymorphic Landscapes of Molecular Crystals" (PI Alexandre Tkatchenko, University of Luxembourg, Co-PIs: Marina Meila, Alvaro Vasquez-Maiagoytia, ANL, and Robert DiStasio, Jr., Cornell University) as one of four selected proposals in the 2017-2018 ALCF Data Science Program.

Molecular crystals play a central role in the search for alternative energy sources and disease-curing pharmaceuticals. Of crucial importance to the design and control over the physical and chemical properties of a molecular crystal is the fact that molecular solids often have many alternative structures  -- polymorphs -- that display drastically different  characteristics.

The project carries out extensive state-of-the-art crystal structure prediction  studies for hundreds of molecular crystals, studying the collective properties of the 1000-2000 polymorphs per crystal. The analysis and interpretation of the space of polymorphs employs unsupervised and semi-supervised machine learning techniques (such as dimensionality reduction) implemented in the megaman package.  The goal of the analysis is the discovery of collective variables involving internal molecular and cell degrees of freedom, which can lead to novel reduced models for molecular crystals.