|About this Abstract
||MS&T21: Materials Science & Technology
||AI for Big Data Problems in Advanced Imaging, Materials Modeling and Automated Synthesis
||Machine Learning Polymer Property Prediction Models with Polymers Represented as Natural Language
||Christopher Benjamin Kuenneth, Rampi Ramprasad
|On-Site Speaker (Planned)
||Christopher Benjamin Kuenneth
Polymer informatics tools have been recently gaining ground to design and discover polymers that meet specific application needs. A critical component of such tools is the conversion of polymers to machine readable representations (so-called fingerprints). The fingerprinting process has so far been based on handcrafted approaches that capture key chemical and structural features. Recently, within the domain of natural language processing, transformer-based ML models have demonstrated a new, fully ML based path to obtain fingerprints of language. Here, we view SMILES strings as a language representation of polymers, and use them to train a transformer based ML model using more than 100 million SMILES strings. The performance of the so-derived fingerprints are compared with traditional fingerprints using a large polymer property data set. Our new approach has a similar prediction performance compared to the existing state-of-the-art methods, but is faster, more flexible, and allows us to create fully-autonomous ML pipelines.