Session Sheet - AI/Data Informatics: Applications and Uncertainty Quantification at Atomistics and Mesoscales: Session IV

AI/Data Informatics: Applications and Uncertainty Quantification at Atomistics and Mesoscales: Session IV
Sponsored by: TMS Materials Processing and Manufacturing Division, TMS: Computational Materials Science and Engineering Committee
Program Organizers: Kamal Choudhary, National Institute of Standards and Technology; Garvit Agarwal, Argonne National Laboratory; Wei Chen, University at Buffalo; Mitchell Wood, Sandia National Laboratories; Vahid Attari, Texas A&M University; Oliver Johnson, Brigham Young University; Richard Hennig, University of Florida

Tuesday 2:00 PM
March 16, 2021
Room: RM 33
Location: TMS2021 Virtual

Session Chair: Sukriti Manna, Argonne National Laboratory; Noah Paulson, Argonne National Laboratory

2:00 PM
Fast Crystal Structure Reconstruction and Prediction Method: Based on X-ray Diffraction Dataset and Neural Network: Cheng-Che Tung¹; Yan-Zhen Chen¹; Yuan-Yu Lin²; Nan-Yow Chen³; An-Cheng Yang³; Po-Yu Chen¹; ¹National Tsing Hua University; ²National Chiao Tung University; ³National Center for High-Performance Computing
    The existing X-ray diffraction (XRD) data acquisition and analysis process is time-consuming and requires the assistance of a database. In this study, we propose a machine learning based workflow which can directly perform crystal system classification and structure regression from XRD patterns. We obtained the theoretically calculated structure and XRD patterns of 123,904 crystals through Materials Project's API. The XRD patterns were used as a dataset and trained the neural network by supervised learning. We achieved more than 92% validation accuracy in classification, while mean-square error can be lower than 0.15 in regression. The 3D structure of the crystal can be predicted and reconstructed rapidly, even if the composition is unknown, after input the XRD data. This study provides a framework for effectively using massive data of material and has the potential to be extended to many microstructure forecasting.

2:20 PM
Finding and Sharing Atomistic Materials Data and Software with the NIST Materials Resource Registry: Chandler Becker¹; Raymond Plante¹; Laura Bartolo²; Robert Hanisch¹; James Warren¹; Gretchen Greene¹; ¹Material Measurement Laboratory, National Institute of Standards and Technology; ²Center for Hierarchical Materials Design, Northwestern University
    With more resources (software, repositories, datasets, etc.) being developed for atomistic materials science, it can be difficult for users to understand what is available and where to find it. The NIST Materials Resource Registry (NMRR, https://materials.registry.nist.gov) is one way to catalog and link these resources in a federated manner that provides for greater access, interoperability, and assessment while still allowing resource owners to maintain their data and access policies. We will discuss the contents of the system, how it fits into the larger ecosystem of atomistic materials data, and plans for future development.

2:40 PM
Accelerating High Throughput Materials Simulation Studies Using Machine Learning Based Application Programming Interface (API): Jason Gibson¹; Stephen Xie¹; Richard Hennig¹; ¹University of Florida
    Materialsweb.org is an online database of density functional theory (DFT) calculations emphasizing 2D materials with thousands of electronic structure calculations and multiple GASP runs of select systems. We present an API that utilizes this data to facilitates the computation of various, proven ML representations, including Smooth Overlap of Atomic Positions and symmetry functions, which require only the structure of the material, or ML descriptors, such as MAGPIE, which requires only chemical composition. These descriptors can then serve as inputs to the pre-trained ML models that utilize neural networks, random forests, and kernel ridge regression to predict potential energy surfaces and scalar properties such as formation energy and band gaps. A structure search with GASP produces thousands of configurations for a particular system and, in turn, thousands of data points. This data is used to train the ML models allowing accurate predictions utilizing only structural information of select systems. The data from the electronic structure contain a diverse set of materials systems that allow predictions of a variety of materials using only information about the chemical composition. All software will be freely available under the open-source Apache License 2.0.

3:00 PM
Coupling Machine Learning and Global Structure Optimization in GASP 2.0: Stephen Xie¹; Shreyas Honrao¹; Venkata Kolloru¹; Richard Hennig¹; ¹University of Florida
    We present the second iteration of the Genetic Algorithm for Structure Prediction (GASP), which adds support for predicting structures on substrates as well as acceleration with machine-learned surrogate models. GASP-Python, first released in 2016, is a grand-canonical evolutionary algorithm for global structure optimization. Here, we demonstrate the effectiveness of coupling GASP with a surrogate model for formation energy, which we fit and improve on-the-fly as the search progresses. As the algorithm produces candidate structures through genetic operations like crossover, the surrogate model is used to predict their ground-state formation energies. By eliminating candidates belonging to previously-explored, high-energy basins of attraction, this machine-learning approach reduces the number of expensive energy evaluations required to explore the energy landscape. We also compare different choices of representations used to encode the relevant physical information into machine-readable inputs. Finally, we demonstrate the approach on bulk and low-dimensional material systems.

3:20 PM
Harnessing Materials Data and Simulation Capabilities for the Accelerated Discovery of Photocathode Materials: Evan Antoniuk¹; Yumeng Yue¹; Yao Zhou¹; Peter Schindler¹; W. Schroeder²; Theodore Vecchione³; Bruce Dunham⁴; Piero Pianetta³; Evan Reed¹; ¹Stanford University; ²University of Illinois at Chicago; ³SLAC; ⁴SLAC
    The recent development of open-source computational databases has enabled data-driven approaches to identify candidate materials for various applications. As an illustrative example of the potential of these data-driven approaches, we will highlight our efforts in computationally screening for photocathode materials for use in hard x-ray free electron lasers. Past efforts for the discovery of photocathode materials have primarily utilized trial and error approaches with very low throughput. Informed by this available experimental data, we develop a generalizable density functional theory-based photoemission model that is suitable for rapidly identifying candidate photocathode materials. With the aid of this model, we calculate the photoemission properties of over 10,000 bulk crystals, creating several orders of magnitude more photoemission data than before. We then screen this dataset to discover hundreds of candidate photocathode materials. Through close partnerships with experimental collaborators, we will discuss the potential for experimental realization of these newly discovered photocathode materials.

3:40 PM
De Novo Design of Therapeutic Agents Against COVID-19 Using Artificial Intelligence: Srilok Srinivasan¹; Rohit Batra¹; Henry Chan¹; Ganesh Kamath²; Mathew Cherukara¹; Subramanian Sankaranarayanan¹; ¹Argonne National Laboratory; ²Dalzielfiver LLC
    Despite the vast chemical space (billions) that can be potentially explored as therapeutic agents, we remain severely limited in the search owing to the high computational cost of the popular docking simulations-based screening procedures. In addition, the screening procedures are limited to the already known chemical spaces. Here, we present a de novo design strategy that leverages artificial intelligence to discover new ligands targeting the spike protein (S-protein) of SARS-CoV-2 at its host receptor region or S -protein:Angiotensin converting enzyme 2 (ACE2) receptor interface. Our workflow integrates a Monte Carlo Tree Search algorithm (MCTS) with a multi-task neural network (MTNN) and recurrent neural networks (RNN) to sample the vast chemical space. We generate several new biomolecules that outperform FDA and non-FDA biomolecules from existing databases. Although we focus on therapeutic biomolecules, our AI strategy is broadly applicable for accelerated design and discovery of any chemical molecules with user-desired functionality.

4:00 PM
AI Guided Discovery of Self-assembly Peptide Sequences using Monte Carlo Tree Search and Coarse-grained Simulations: Rohit Batra¹; Troy Loeffler¹; Henry Chan¹; Srilok Srinivasan¹; Christopher Fry¹; Subramanian Sankaranarayanan¹; ¹Argonne National Lab
    Peptide materials have a wide array of functions from tissue engineering, surface coatings, catalysis, and sensing. This class of biopolymer is composed of a sequence of 20 naturally occurring amino acids. As the peptide sequence increases, so does the searchable sequence space (trimer = 20³ or 8,000 peptides and a pentamer = 20⁵ or 3.2 M). Empirically, peptide design is guided by the use of structural propensity tables, hydrophobicity scales, or other desired properties and typically yields <10 peptides per study, barely scraping the surface of the search space. Here, we combine machine learning techniques, such as Monte Carlo tree search and random forest, with coarse-grained molecular dynamics (MD) simulations to efficiently search large spaces of trimer, pentamer and octamer peptide sequences with high self-assembly propensity. Subsequent experiments on identified sequences support our findings, and demonstrate the ability of this approach for peptide design.