Abstract Scope |
We present a comprehensive, modular framework leveraging large language models (LLMs) to automatically extract and structure materials data from unstructured scientific literature in alignment with FAIR (Findability, Accessibility, Interoperability, Reusability) data principles. The workflow employs custom prompts to convert diverse PDF formats into structured datasets, capturing critical properties including alloy composition, yield strength, tensile strength, elongation, and additive manufacturing parameters. Utilizing inference-only LLM APIs (e.g., Google Gemini 2.5 Flash), the pipeline eliminates the need for extensive local computational resources or model retraining. A retrieval-augmented module leverages embeddings to efficiently match user queries with dataset entries. LLM-based reasoning supports physics-informed interpolation and proxy selection, enabling accurate predictions from limited data. Validated on extensive datasets, this framework supports automated literature surveillance, inverse materials design, and hybrid computational-experimental research, significantly advancing AI-driven materials discovery.
|