This study explores the possibility to utilize inaccurate but automatically obtained labels to form an extensive dataset to train deep learning models. This is critical for abilities toward intelligent manufacturing in Industry 4.0/Industry 5.0 era that allows automatically collecting massive data through Internet of Things (IoT) from different operation at different location in different time, referred to as IoT-D^3 in this paper. We use weld penetration monitoring as a demonstration case because it occurs underneath workpieces and measuring it directly during welding is not feasible/practical; deep learning models are needed to derive it from complex/measurable surface phenomena; and the needed “directly unmeasurable” accurate labels to train deep learning models are too expensive to obtain and are not automatically obtained during manufacturing. As such, this study tries to answer a fundamental question on whether if a deep learning model may be trained from some inaccurate labels which may be easily automatically collected/transmitted. We formulated this question as follows: if a deep learning model can be trained from accurate penetration labels, can it also be trained from inaccurate labels for same or even better accuracy? To answer this question, we demonstrated its similarity with the standard Least Squares problem, proposed the zero-mean as the condition for the labeling inaccuracy, and demonstrated that the inaccuracy can be overcome by increasing the data size. To experimentally demonstrate this, welding current in gas tungsten arc welding (GTAW) has been proposed as a substitutive/inaccurate penetration label, a method has been proposed to design experiments to better assure its “zero-mean” inaccuracy, and the current has been filtered to decrease the labeling inaccuracy by reducing the effect from weld pool dynamics. It is found that the filtered current as an inaccurate label can be used to train a same deep learning model structure to monitor the penetration with the same accuracy by increasing the data size ten times. The feasibility to train deep learning models for intelligent manufacturing is thus also demonstrated using the weld penetration monitoring case. In addition, we also proposed to use large amount of inaccurate labels to pre-train a model and use a smaller set of accurate labels to calibrate from the current to the penetration and fine-tune the model. This pre-training based method, which is part of the foundation for the idea behind the generative pre-trained transformer (GPT), achieved better accuracy using 10 percent of the accurate labels through effectively taking advantage of the large set of inaccurate labels. While this pre-training approach has enjoyed tremendous success in the GPT model in the unsupervised setting, we demonstrate the effectiveness of pre-training in the supervised learning setting with inaccurate labels.