Based on the concepts of matrix arithmetic and big Civic and political education, this paper carries out optimization research on the project-based teaching mode of physical education courses in higher vocational colleges. At the level of teaching design, a teaching mode based on the programization of physical education courses is constructed, and the Civic and Political elements are integrated into the whole process of sports skills training. The inter-frame difference method is used to realize motion detection, and the Fourier transform technology and kernel component analysis are used for feature extraction and dimensionality reduction of the image. Then the hierarchical spatial feature extraction model CNN and temporal feature processing model LSTM are introduced to establish a recognition model based on CNN-LSTM hybrid neural network to recognize sports actions. The results show that the CNN-LSTM hybrid model is less accurate than the combined model in terms of performance metrics compared to the model when only CNN or LSTM alone module is used, and the error value of the combined model for action capture is lower, which proves that the combined model has stronger performance. In addition, the model in this paper is able to capture the spatio-temporal relationship between time series and joints more accurately and quickly, and in the process of realizing the evaluation of the standard degree of different actions, the action recognition error of the CNN-LSTM model is reduced by more than 20% compared with that of the traditional model, which shows that the model in this paper has an excellent recognition effect on different sports actions as well.