About this Abstract |
Meeting |
MS&T25: Materials Science & Technology
|
Symposium
|
Enhancing the Accessibility of Machine Learning-Enabled Experiments
|
Presentation Title |
Hypothesis Formation and Predictive Modeling of 2D
Perovskite Spacer Cations Using Retrieval Augmented LLMs and Deep Kernel Learning |
Author(s) |
Jordan Marshall, Elham Foadian, Sheryl Sanchez, Utkarsh Pratiush, Rushik Desai, Mahshid Ahmadi, Sergei Kalinin, Arun Kanakkithodi |
On-Site Speaker (Planned) |
Jordan Marshall |
Abstract Scope |
In this work, we introduce a dynamic, hypothesis-driven framework that connects large language models (LLMs) and machine learning to accelerate the discovery of novel spacer cations for quasi-2D perovskite materials. By combining Retrieval-Augmented Generation (RAG)-powered literature mining with predictive modeling, we map underexplored regions of chemical space with greater speed and precision. Our pipeline rapidly transforms sprawling scientific literature into structured, machine-learning-ready datasets. We will share how we identified Google's NotebookLM as the optimal extraction tool, designed a rich molecular descriptor set blending cheminformatics and DFT features, and trained a Deep Kernel Learning model that fuses graph embeddings with uncertainty-aware prediction. We will also explore how active learning strategies prioritized new spacer candidates for experimental validation. This talk will focus on the challenges and breakthroughs in scaling LLM-driven hypothesis formation and discuss how bridging natural language understanding with predictive modeling is reshaping materials discovery. |