Abstract Scope |
Machine learning models require as a prerequisite the existence of data that are available, complete, consistent, accurate, and numerous. Although experimental data are usually accurate, they are often not numerous enough to enable meaningful deep learning approaches. As an alternative path, synthetic data generated by high-throughput molecular dynamics simulations can offer large, consistent datasets. However, their limited accuracy does not always yield a perfect agreement with experiments—which makes it challenging to directly combine experimental and simulation data within universal, unifying datasets. Here, we present a new “data fusion” approach that can simultaneously leverage the advantages of experimental and simulation data—wherein experimental and simulation data mutually inform, augment, and advance each other. We demonstrate that our fused model systematically outperforms models that are solely trained based on experimental (or simulation) data. |