A self-supervised learning-based multimodal action and identity recognition model for low bandwidth conditions

doi:70517/ijhsa463716

Research article
DOI: https://doi.org/10.70517/ijhsa463716

Volume 46, Issue 3
Pages: 8375
-8384
Open Access
Download

A self-supervised learning-based multimodal action and identity recognition model for low bandwidth conditions

By: ^¹

¹Department of Information Science and Technology, Shanghai Ocean University, Shanghai, 201316, China

Published: 06/08/2025

Abstract

Due to the limited data transmission under low bandwidth conditions, the performance of traditional multimodal motion and identity recognition cannot be fully released. In this paper, based on the data collected under four motion modes: standing, slow walking, running and walking up and down stairs, 20-dimensional eigenvalues including three-dimensional eigenvalues and combined vector eigenvalues are calculated and analyzed to complete the selection of eigenvalues for the four human motion modes. The acquired feature values are fused into a unified spatio-temporal graph convolutional network (ST-GCN) framework to extract the global spatio-temporal features of the action from both time and space dimensions, and carry out end-to-end training. Meanwhile, in terms of model structure, the feature recalibration structure based on the attention mechanism is selected to recalibrate the shared layer features, and a multimodal action and identity recognition model based on the ST-GCN algorithm is constructed. The accuracy of this model for action recognition can be as high as 99.76% under specific sample division conditions.

Keywords: spatio-temporal graph convolutional network, feature recalibration structure, attention mechanism, action and identity recognition

On this page

A self-supervised learning-based multimodal action and identity recognition model for low bandwidth conditions

Abstract