Traditional quality management methods have problems such as insufficient prediction accuracy and slow response speed when facing complex production environments, which are difficult to meet the demand for refined management of modern production. In this study, the quality stability prediction model of cigarette production process based on Meta-DQN is constructed, which solves the deficiencies of traditional methods in small sample learning and environmental adaptability. 500 samples of working condition data such as temperature, humidity, airflow speed, etc. were collected through the production line sensors in cigarette factory H. The MQTT communication protocol was used for data transmission, and MinMaxScaler normalization was applied to ensure data consistency. The Meta-DQN prediction model is constructed by combining the Meta Reinforcement Learning MAML algorithm with the deep Q network, and the fast adaptation to different production tasks is achieved through the two-layer loop optimization mechanism. The experimental results show that the R² coefficients of determination of the training and test sets reach 0.992 and 0.957, respectively, and the model prediction accuracy is significantly improved. In the key parameters prediction validation, the average deviation of the predicted values of six process parameters from the real values is only 0.212, which is much lower than the standard setting of 0.565. Comparison experiments show that the Meta-DQN model can quickly converge in the initial operation stage of the equipment, effectively reducing the scrap rate, which is significantly better than that of the pure DQN algorithm. The method provides an efficient and intelligent solution for cigarette production quality management and has important engineering application value.