With the rapid development of educational informatization, traditional vocal music teaching faces problems such as lack of personalized teaching resources and single learning evaluation. Through data fusion technology, creating a personalized teaching system for vocal music that integrates audio and text features can effectively improve the teaching effect and learning experience. In this study, the MFCC method is used to extract audio features, the TF-IDF function is used to extract text features, and the EFFC multimodal fusion algorithm based on HMM constraints is designed to realize the effective fusion of the two modalities. The experimental results show that the accuracy of MFCC audio feature extraction reaches 0.972, which is significantly better than other feature extraction methods; the accuracy of HMM-EFFC fusion algorithm is 0.9635, and the F1 value reaches 0.9676, which is better than the comparative algorithm; in the system application test, the accuracy of the judgment of learning interest reaches 98.37%, and the CPU occupancy rate is only 27.53%. The study proves that the personalized teaching system for vocal music based on data fusion algorithm can effectively improve the teaching effect and learning experience, and provides a new idea for the innovation of vocal music teaching.