To improve the effectiveness of fault prediction and health management for high-speed train wheels, this paper proposes a data processing framework based on Spark Streaming and Kafka to collect, clean, and transform high-speed train PHM source data. Based on the data processing, correlation algorithms were used to identify the influencing factors of wheel set wear. Considering the complexity of changes in high-speed train wheel size data influenced by operational environments and other factors, a wheel size prediction model based on VMDPSO-MKELM was constructed to achieve accurate prediction of high-speed train wheel set data. In wheel diameter data, the MSE, MAE, and MAPE of the VMD-PSO-MKELM model in this paper are 0.0012, 0.0294, and 0.0004%, respectively, with R2 reaching 0.9968; For flange thickness data, the corresponding values are 0.0081, 0.0741, and 0.0005% for MSE, MAE, and MAPE, respectively, with an R² of 0.9251. Whether in wheel diameter data or flange thickness data, the MSE, MAE, and MAPE of the VMD-PSO-MKELM model in this paper are lower than those of the compared ELM, L-ELM, P-ELM, and R-ELM models, and the R² is the highest, demonstrating high prediction accuracy and greater practicality