Abstract Scope |
Artificial intelligence (AI) probes data for high-dimensional trends that are hard to identify by conventional analysis. Thus, the key to all AI methods, from natural language processing to computer vision to deep learning, is data. However, for many AI applications, the quantity and quality of data required for optimal outcomes is not understood. One solution is to err on the side of data quantity, amassing large, homogeneous data sets. While this may be viable in the social media realm, it is less feasible for physical science and engineering problems where the data is expensive and often heterogeneous. Fortunately, physical data collected by scientists have several advantages: They are selected for their known relevance to the problem, bounded by a physical basis, expertly acquired, and rich in information. Using examples from microstructural characterization, we will survey the factors that should be considered when designing a materials science data set for AI analysis. We will evaluate the relative importance of data size, data type, and data quality. One encouraging observation is that excellent AI outcomes can often be obtained with surprisingly small data sets. |