With the in-depth application of artificial intelligence technology in the field of education, intelligent teaching systems have evolved from simple computer-assisted learning to complex systems capable of understanding and responding to learners’ emotional states. This paper explores the moderating role of emotion perception on learners’ behavioral patterns in AI-assisted education based on an intelligent sentiment analysis model. The study designs a bimodal sentiment analysis model GTA-BERT that retains text-sentence dependency analysis information and fuses text-speech masked attention, which consists of four parts: text feature extraction, speech feature extraction, DEGCN text-sentiment enhancement module and masked attention fusion. Through comparison experiments with mainstream sentiment analysis models and questionnaire surveys, the study verifies the effectiveness of the model in sentiment recognition and learning behavior regulation. The results show that the GTA-BERT model performs well in multimodal sentiment analysis, with Acc-2, F1, Acc2-weak, and Corr values of 93.24, 83.96, 75.01, and 72.67, respectively, which are the highest values among all the compared models. The empirical study confirmed the direct impact of AI-assisted instruction on learners’ behavioral patterns, while emotional perception and mood played an important mediating role in it. The conclusion of the study shows that the intelligent sentiment analysis model can effectively identify learning emotions, while the application of emotion perception in the teaching process helps to regulate learners’ behavioral patterns and improve the effectiveness of AI-assisted education.