Due to the limited data transmission under low bandwidth conditions, the performance of traditional multimodal motion and identity recognition cannot be fully released. In this paper, based on the data collected under four motion modes: standing, slow walking, running and walking up and down stairs, 20-dimensional eigenvalues including three-dimensional eigenvalues and combined vector eigenvalues are calculated and analyzed to complete the selection of eigenvalues for the four human motion modes. The acquired feature values are fused into a unified spatio-temporal graph convolutional network (ST-GCN) framework to extract the global spatio-temporal features of the action from both time and space dimensions, and carry out end-to-end training. Meanwhile, in terms of model structure, the feature recalibration structure based on the attention mechanism is selected to recalibrate the shared layer features, and a multimodal action and identity recognition model based on the ST-GCN algorithm is constructed. The accuracy of this model for action recognition can be as high as 99.76% under specific sample division conditions.