The rapid development of the era of melting media has made the dissemination and creation of popular music more free, thus promoting the combination of life and art, and further narrowing the distance between audiences and music creators. The article combs through the changes of popular music communication characteristics and interaction modes in the era of melting media, and explores the specific manifestations of audience behavior in the process of music communication. The audience participating in music short video communication in DY short video platform is taken as the research object, and its dynamic behavioral characteristics are captured to produce data for audience behavior prediction. Combining 3D-CNN network with GRU in recurrent neural network, Conv3D-GRU model was constructed for predicting audience dynamic behavior in music communication. The results show that compared with GMSDR, the RMSE and MAE of this paper’s model are significantly improved by 13.08% and 14.77%, respectively, and the PCC value of the model can reach up to 97.79%.The Conv3D-GRU model possesses a better two-category error and accuracy, and the overall dynamic analysis efficiency reaches 87.44%. Combining neural network technology with music communication audience behavior prediction helps to expand the scope of music communication in the melting media environment.