|About this Abstract
||MS&T21: Materials Science & Technology
||Ceramics and Glasses Modeling by Simulations and Machine Learning
||Now On-Demand Only: Information Extraction Pipeline for Glasses: An NLP Based Approach
||Vineeth Venugopal, Sourav Sahoo, Mohd Zaki, Nitya Nand Gosvami, N. M. Anoop Krishnan
|On-Site Speaker (Planned)
||N. M. Anoop Krishnan
A large amount of information about materials is scattered in scientific journals, handbooks, patents, textbooks and other resources. The text and images comprise most of the information which is currently unstructured. To retrieve research papers related to particular topics in specialized materials science domains or get information from figure captions are trivial tasks. Therefore, to streamline information extraction from research papers, we present latent Dirichlet allocation (LDA) assisted topic labelling to obtain glass science papers on the basis of their abstract. Further, we develop “Caption Cluster Plots” (CCP) to automate information extraction from figure captions. Using both LDA and CCP, we have also developed “Elemental Maps” which disseminate the information about which chemical elements are used in abstracts of which research papers and associated figure captions. Hence, this pipeline will enable researchers to explore different material science domains and excavate the hidden information from the vast corpora of research articles.