| About this Abstract | 
   
    | Meeting | MS&T21: Materials Science & Technology | 
   
    | Symposium | Ceramics and Glasses Modeling by Simulations and Machine Learning | 
   
    | Presentation Title | Now On-Demand Only: Information Extraction Pipeline for Glasses: An NLP Based Approach | 
   
    | Author(s) | Vineeth  Venugopal, Sourav  Sahoo, Mohd  Zaki, Nitya Nand  Gosvami, N. M. Anoop   Krishnan | 
   
    | On-Site Speaker (Planned) | N. M. Anoop   Krishnan | 
   
    | Abstract Scope | A large amount of information about materials is scattered in scientific journals, handbooks, patents, textbooks and other resources. The text and images comprise most of the information which is currently unstructured. To retrieve research papers related to particular topics in specialized materials science domains or get information from figure captions are trivial tasks. Therefore, to streamline information extraction from research papers, we present latent Dirichlet allocation (LDA) assisted topic labelling to obtain glass science papers on the basis of their abstract. Further, we develop “Caption Cluster Plots” (CCP) to automate information extraction from figure captions. Using both LDA and CCP, we have also developed “Elemental Maps” which disseminate the information about which chemical elements are used in abstracts of which research papers and associated figure captions. Hence, this pipeline will enable researchers to explore different material science domains and excavate the hidden information from the vast corpora of research articles. |