Abstract Scope |
Predicting crystal structure has always been a challenging problem for physical sciences. Recently, computational methods have been built to predict crystal structure with success but have been limited in scope and computational time. We explored the breadth versus accuracy of building a model to predict across any crystal structure using machine learning. We extracted 24,913 unique chemical formulas existing between 290 and 310 K from the Pearson Crystal Database. Of these 24,913 formulas, there exists 10,711 unique crystal structures referred to as entry prototypes. Common entries might have hundreds of chemical compositions, while the vast majority of entry prototypes is represented by fewer than ten unique compositions. To include all data in our predictions, entry prototypes that lacked a minimum number of representatives were relabeled as “Other”. By selecting the minimum numbers to be 150, 100, 70, 40, 20, and 10, we explored how limiting class sizes affected model performance. |