Augmented reality technology, as an important achievement in the development of science and technology in the new era, has been widely used in the field of education. The study is based on the 3D-ResNet18 network, which is improved by adding Self-Attention layer and transformer encoder to construct an emotion recognition model based on deep learning. The model is combined with augmented reality technology and used together in art education. Through the experiments conducted on the image data collected by students in art teaching, the improved 3D-ResNet18 network model in this paper has high accuracy in recognizing the emotion of students’ expressions, and the recognition accuracies of confusion, happiness, normality, and boredom are all over 90%, and the overall recognition accuracies are improved by 0.51%~13.49% compared with other methods, which reflects the high-precision emotion recognition performance of the constructed method. After being used in AR art teaching, the overall emotion score of the sample students was recognized to be about 0.65, which confirms the effectiveness and practicality of the fusion application of the emotion recognition model and AR technology, which can support the diagnosis of students and classroom situations, and is conducive to the timely adjustment of the teaching program and the promotion of the development of the quality of art education.