Abstract Scope |
Materials datasets, especially those capturing high-temperature properties, pose challenges for learning tasks due to skewed distributions, wide feature ranges, and multimodal behaviors. While tree-based models like XGBoost perform well on many tabular problems, their reliance on piecewise constant splits limits their ability to capture smooth, long-tailed, or higher-order relationships in materials data. We investigate encoder-decoder models for data transformation, including regularized Fully Dense Networks (FDN-R), Disjunctive Normal Form Networks (DNF-Net), 1D Convolutional Neural Networks (CNNs), Variational Autoencoders, and TabNet, a hybrid attention-based model. Results show that although XGBoost remains competitive on simpler tasks, encoder-decoder models particularly FDN-R and DNF-Net generalize better on highly skewed targets like creep resistance across varying dataset sizes. TabNet provides moderate gains but underperforms on extremes. These findings underscore the importance of matching model architecture to feature complexity and highlight the potential of encoder-decoder models for robust, generalizable prediction of materials properties from compositional data. |