Session Sheet - AI/Data Informatics: Computational Model Development, Validation, and Uncertainty Quantification: Session II

AI/Data Informatics: Computational Model Development, Validation, and Uncertainty Quantification: Session II
Sponsored by: TMS Materials Processing and Manufacturing Division, TMS: Computational Materials Science and Engineering Committee
Program Organizers: Saurabh Puri, Microstructure Engineering; Francesca Tavazza, National Institute of Standards and Technology; Dennis Dimiduk, BlueQuartz Software LLC; Darren Pagan, Pennsylvania State University; Kamal Choudhary, Johns Hopkins University; Saaketh Desai, Sandia National Laboratories; Shreyas Honrao, Aionics; Ashley Spear, University of Utah; Houlong Zhuang, Arizona State University

Monday 2:00 PM
March 20, 2023
Room: Cobalt 520
Location: Hilton

Session Chair: Saaketh Desai, Sandia National Laboratories; Amit Verma, Lawrence Livermore National Laboratory

2:00 PM
Addressing Semantic Challenges towards Data Mining using Natural Language Processing: Amit Verma¹; Zhisong Zhang²; Benjamin Glaser²; Robin Kuo²; Jason Zhang²; Nicholas David²; Emma Strubell²; Anthony Rollett²; ¹LLNL; ²Carnegie Mellon University
    Data problems persists across many disciplines of materials science, with a particular dearth for high temperature materials where most material attributes need to be determined experimentally. To address this challenge, we are working on two key ideas: 1) data retrieval; and 2) recognition systems for identifying key concepts and their dependencies, from published literature. The first aim to address the lack of open-access experimental data for various machine learning activities, while the second aim to encode the semantics of the domain for bridging various heterogenous data sources. Natural Language Processing (NLP) provides a host of solutions in this regard, and this talk focuses on how NLP is being used to develop the tools mentioned, with specific examples to support our vision. This includes, but is not limited to, BERT language models for entity resolution, conditional random field models for entity extraction, etc.

2:20 PM
A Data Facilitation Platform for Materials Science Literature Mining: Vipul Gupta¹; Florian Pyczak¹; Ingo Schmitt²; ¹Helmholtz-Zentrum Hereon; ²BTU Cottbus-Senftenberg
     Recent developments in the field of data mining (DM) have received considerable attention from the materials science community due to its ability to accelerate the design of new materials. Experimental datasets of materials are published in scientific literature. Mining such literature thus enables the possibility of evaluating the combined experimental datasets synergistically.The selection of relevant, machine-readable data is essential for DM. However, digital libraries that act as data sources typically allow only generic searches. Highly specific searches are not possible, such as retrieval of literature that has exclusively TiAl-Creep datasets. Moreover, these digital libraries usually provide data in a human-readable format. This work presents a system that facilitates a generic search-based ingestion of literature from digital libraries, followed by the selection of DM relevant literature. Besides phrase, facet, and full-text search capabilities, the selection mechanism also allows dataset-aware literature retrieval through figure caption and domain knowledge taxonomy-based semantic searches.

2:40 PM
Compactness Matters: Improving Bayesian Optimization Efficiency of Materials Formulations through Invariant Search Spaces: Sterling Baird¹; Jason Hall²; Ramsey Issa¹; Taylor Sparks¹; ¹University of Utah; ²Northrop Grumman Innovation Systems
    Would you rather search for a line inside a cube or a point inside a square? Physics-based simulations and wet-lab experiments often have symmetries (degeneracies) that allow reducing problem dimensionality or search space, but constraining these degeneracies is often unsupported or difficult to implement in many optimization packages, requiring additional time and expertise. So, are the improvements in efficiency worth the cost of implementation? We demonstrate the compactness of a search space (to what extent and how degenerate solutions and non-solutions are removed) affects Bayesian optimization search efficiency. Here, we use the Adaptive Experimentation (Ax) Platform by Meta™ and a formulation optimization task with eight or nine tunable parameters, depending on search space compactness. In general, the removal of degeneracy through problem reformulation improves optimization efficiency. We recommend that optimization practitioners in the physical sciences carefully consider the trade-off between implementation cost and search efficiency before running expensive optimization campaigns.

3:00 PM
Using Categorical Structures in Model Analysis & Development: Kalan Kucera¹; John Nychka¹; Glenn Hibbard²; ¹University of Alberta; ²University of Toronto
    Judging the efficacy of the underlying empirical and analytical models of physical systems relies on deconstructing the mathematical structures innate in materials models. We present a category theoretic approach to this challenge, which examines the type and variety of mathematical structures inherent to materials models, using tools of abstraction to illustrate the mathematical forms taken by processes such as irreversible thermodynamic transformations and measurement. The categorization and synthesis of these forms create a scaffolding by which models of physical systems can be connected on an abstract, mathematical basis, providing a rigorous backdrop against which the directions and viability of material design and discovery may be organized. We then demonstrate the conceptual efficacy of the category theoretic approach to multiple facets of materials science and engineering.

3:20 PM
Intrinsic Dimensionality Estimates for Microstructural Data: Megna Shah¹; Veera Sundararaghavan²; Jeff Simmons¹; ¹Air Force Research Laboratory; ²University of Michigan
    Microstructure images, generated by many modalities, contain information about the material, but almost certainly less information than the number of bits they take up on a hard drive. Relatedly, there are a number of established methodologies to estimate the intrinsic dimensionality of data, given a distance or similarity metric. We apply and modify some of these methodologies to estimate the intrinsic dimensionality of various microstructure image datasets with varying distance/similarity metrics, including physics informed metrics. Understanding the dimensionality of a microstructure dataset has numerous implications for choosing latent dimensions in a neural network that will be trained on microstructure data, and ultimately navigating the learned microstructure latent space along those dimensions.

3:40 PM Break

4:00 PM
XenonPy: An Open Source Platform for Data-driven Materials Design with Small Data: Stephen Wu¹; Chang Liu¹; Ryo Yoshida¹; ¹The Institute of Statistical Mathematics
    Machine learning has been proven to help accelerate materials discovery in different applications that have accumulated a large amount of data. However, there are many situations in materials science where the available data is limited due to lack of transparency of the specific field or extremely large search space of potential material candidates. Transfer learning is a machine learning technique that aims to improve learning efficiency of a target design task with little data by extracting useful knowledge from a relevant task with "big data". In this work, we developed an open source platform to perform transfer learning in materials science and also provide a large model database that serves as an open knowledge pool. Using this platform, we have made new material discoveries across different material types under different application scenarios, including both inorganic and organic compounds.

4:20 PM
Uncertainty and Domain Quantification in Machine Learning Regression Models for Materials Properties: Dane Morgan¹; Glenn Palmer²; Lane Schultz¹; Yiqi Wang¹; Ryan Jacobs¹; ¹University of Wisconsin-Madison; ²Duke University
    In this talk we discuss our recent work on assessing uncertainties and domain for machine learning regression models that predict materials properties. We demonstrate that a simple calibrated ensemble model approach is quite accurate for predicting the standard deviation of a target value prediction for new data reasonably close to the training data (Palmer et al, npj Computation Materials, 2022). We further demonstrate that a simple distance metric on feature space can be used in conjunction with these error bars to predict when a new data point will be within the domain of a model. These approaches can be applied in a fully automated way to almost any materials property prediction model providing practical guidance on model errors and domain.

4:40 PM
A Quantitative Approach to Explainable AI in DIW AM: Jennifer Ruddock¹; Robert Weeks²; Ezra Ameperosa¹; James Hardin¹; Jennifer Lewis²; ¹Air Force Research Lab; ²Harvard University
    Machine learning is an increasingly prevalent tool in automated manufacturing. However, having an understanding of the uncertainty in a prediction, or an understanding of the likelihood the model is over- or underestimating a value, and understanding how the algorithm came up with a given prediction is important. Here, we use Layerwise Relevance Propagation (LRP) to examine the results of convolutional neural networks used to estimate the rheological properties of inks printed by direct ink write 3D printing using image regression analysis. In particular, ink properties such as the yield stress, flow index, and consistency index of an ink can be determined from the sharpness of printed corners and the width of deposited filaments. We determine how well the model predicts these properties, while also drawing relations to the LRP pixel relevance values and their locations. We bring a quantitative approach to LRP in understanding print morphologies in AM.

5:00 PM
The interp5DOF Matlab Toolbox: Grain Boundary Energy Models and Uncertainty Quantification: Oliver Johnson¹; Sterling Baird²; Eric Homer¹; David Fullwood¹; Gus Hart¹; ¹Brigham Young University; ²University of Utah
    Leveraging our recently developed Voronoi fundamental zone (VFZ) framework, and Matlab Toolbox for Bayesian inference of grain boundary (GB) structure-property models (interp5DOF), we develop fully-anisotropic (5D) models for GB energy in Ni, Al, and Fe, with quantified uncertainty (UQ). We demonstrate computationally efficient methods to enforce physical constraints (crystallographic symmetry, no-boundary singularity, non-negativity, etc.) in the resulting models, and compute the crystallographic distance between GBs. We evaluate GB energy correlation lengths, and find them to be incredibly consistent across materials and crystal systems. Using these models, we identify pairs of GBs that are connected by low-energy pathways through the GB energy landscape, and which may influence microstructural evolution in ways that have not previously been investigated. Finally, we discuss the potential use of these models in mesoscale simulations, similar to the way that interatomic potentials are employed in atomistic simulations.