Session Sheet - ICME Case Studies: Successes and Challenges for Generation, Distribution, and Use of Public/Pre-Existing Materials Datasets: Public Dataset Construction and Metadata Tagging

ICME Case Studies: Successes and Challenges for Generation, Distribution, and Use of Public/Pre-Existing Materials Datasets: Public Dataset Construction and Metadata Tagging
Sponsored by: TMS Materials Processing and Manufacturing Division, TMS: Integrated Computational Materials Engineering Committee
Program Organizers: Stephen DeWitt, Oak Ridge National Laboratory; Vikas Tomar, Purdue University; James Saal, Citrine Informatics; James Warren, National Institute of Standards and Technology

Monday 2:00 PM
February 28, 2022
Room: 254A
Location: Anaheim Convention Center

Session Chair: James Saal, Citrine Informatics

2:00 PM  Invited
Added Value and Increased Organization: Capturing Experimental Data Provenance in Materials Commons 2.0: Tracy Berman¹; Brian Puchala¹; Glenn Tarcea¹; John Allison¹; ¹University of Michigan
    Capturing the provenance associated with experimental data continues to be a formidable barrier to developing materials datasets. While provenance can be automatically collected and stored behind the scenes in computational studies, in experimental studies the responsibility falls upon the scientists, technicians, and students in the labs. Automatically uploading all information collected on an instrument, or forcing users to fill out forms, risks diluting the system with low quality data and misinformation. This talk will describe the approach used to capture experimental data provenance in Materials Commons 2.0. The speaker will also discuss how working with Materials Commons has changed how they allocate time for experiments, the organizational benefits of formatting data for ingestion into Materials Commons, and the barriers that remain when sharing data.

2:30 PM  Invited
Generating, Sharing, and Using Halide Perovskite Exploratory Synthesis Data to Discover New Materials: Joshua Schrier¹; ¹Fordham University
    Over the past four years, we’ve developed a Robotic-Accelerated Perovskite Investigation and Discovery (RAPID) system to make and characterize halide perovskites via inverse temperature crystallization and antisolvent vapor diffusion methods. Simultaneously, we’ve developed a general-purpose open-source data management system— ESCALATE (Experiment Specification, Capture and Laboratory Automation Technology)—which allows humans and algorithms to specify experiments, converts those plans into instructions for human operators and robots, captures collected data and meta-data for reuse, augments those data with cheminformatics and other analyses, and facilitates data export for sharing and machine learning. In this talk, I will describe the RAPID+ESCALATE technology stack and its deployments across multiple laboratories and experiment types. I will then discuss case studies about how we’ve uses this system to enhance data-sharing in publications, improve experimental reproducibility, and discover new scientific insights using the comprehensive data and metadata records.

3:00 PM  Invited
Challenges in Producing, Curating, and Sharing Large Multimodal, Multi-institutional Data Sets for Additive Manufacturing: Lyle Levine¹; Brandon Lane²; Carelyn Campbell²; Gerard Lemson²; Edwin Schwalbach³; Megna Shah³; ¹The Ohio State University; ²National Institute of Standards and Technology; ³Air Force Research Labroatory
    The additive manufacturing benchmark series (AM Bench) provides the AM community with rigorous measurement datasets for model validation that are permanently archived and freely available. In addition, challenge problems are posed to the modeling community to evaluate the state-of-the-art for AM simulation. Planning and executing these measurements pose numerous challenges but developing the necessary data management and data sharing systems are equally important. Questions to be addressed include: How do data collection and sharing challenges impact the benchmark choices? How do we track samples using persistent identifiers? How can we curate the data and metadata and enable users to explore terabyte-sized, multimodal data sets? How do data choices affect communication with challenge problem participants and evaluation of their simulation results? Although workable solutions to these and other questions and challenges have been developed, work continues on improved solutions that are easy to use and maintain.

3:30 PM Break

3:50 PM
A Validation Framework for Microstructure-sensitive Fatigue Simulation Models: Ali Riza Durmaz¹; Nikolai Arnaudov²; Erik Natkowski²; Petra Sonnweber-Ribic²; Sebastian Münstermann³; Chris Eberl¹; Peter Gumbsch¹; ¹Fraunhofer Iwm; ²Robert Bosch GmbH; ³RWTH Aachen
     Fatigue crack initiation under very-high-cycle-fatigue (VHCF) conditions is highly susceptible to microstructural extrema. Therefore, VHCF life simulation depends on micromechanical crack initiation models. While corresponding computational models exist, their systematic validation is difficult. This is attributed to the lack of costly experimental data on the microstructure scale and the absence of validation methodologies. To this end, EN1.4003 ferritic steel mesoscale specimens were tested in a bending-resonant fatigue setup that allows sensitive damage detection. The experiment was mimicked in a sub-modeling simulation embedding the measured microstructure into the specimen geometry, on which experimental boundary conditions are applied. An elastic continuum simulation of the specimen geometry imposes load on the embedded microstructure, for which deformation is evaluated by phenomenological crystal plasticity FE. Simulated mechanical fields are compared with experimental semantically segmented damage locations from micrographs. This open-access framework enables user subroutine statistical validation and serves as a benchmark for future modeling approaches.

4:10 PM  Invited
NOW ON-DEMAND ONLY - Hard Fought Lessons on Open Data and Code Sharing and the Terra Infirma of Ground Truth: Jason Hattrick-Simpers; ¹
    The use of machine learning (ML) in the physical sciences has stimulated the discovery of exciting new phase change materials, amorphous alloys, and catalysts. But even scientifically sound AI models are only as dependable as the labels and values upon which they are built. The continued success of these methods relies upon the availability of open data, meta-data, and scientific code that are findable, accessible, interoperable and reusable (F.A.I.R.). I will discuss our successes and failures in creating the first F.A.I.R. multi-institution combinatorial dataset and code repository. I will also discuss the tenuousness of ground truth, the need for openly preserving expert disagreement within scientific data sets, and challenges associated with aggregating data from the open literature. This will drive home the difficulties in forming and capturing expert consensus, the impact of consensus variance on ML model evaluation, and the need to recreate important datasets that are born digital.