With the development of information technology, the traditional mode of Civic and Political Education has become insufficient. This study explores the application value of affective computing and multimodal learning behavior data mining in Civic and Political Education. A multimodal classroom environment for Civic and Political Education was constructed based on constructivist learning theory and multimodal discourse analysis theory, and a multimodal learning behavior analysis model integrating deep learning and Bayesian network was designed. The experiments used hybrid discriminant restricted Boltzmann machine (HDRBM) neural network to process the multimodal data, and analyzed the learning causality through Bayesian network. The study invited 30 college undergraduates to participate in the experiment, and the results showed that the percentage of individual students’ focused emotions identified by the system was 47.8%, which was close to the 54.2% of the manual statistics; in the analysis of the overall students’ emotions, the focused emotions identified by the system was 46.78%, and the manual statistics was 51.42%, and the errors of both of them were small. The frequency analysis of multimodal behaviors shows that the frequency of A7 (teacher-oriented) behaviors in students’ participatory learning is the highest; in focused learning behaviors, the ratio of students’ gaze on learning aids (R4) gradually tends to 1 from more than 1 at the beginning of the semester, which indicates that students’ focus on the classroom gradually increases. The study proves that multimodal sentiment analysis and learning behavior mining can effectively improve the teaching effect of Civics education and provide new ideas for Civics education innovation.