In modern Wireless Sensor Networks (WSNs) and cyber-physical systems, multi-source, heterogeneous, and high-dimensional spatiotemporal data pose major challenges for accurate and robust multi-target prediction. Traditional models often fail to capture nonlinear dependencies and long-term temporal patterns, while deep learning methods may lack generalization, stability, and interpretability—especially under the resource constraints of WSNs.This paper proposes STGNet (Spatiotemporal Gradient Network), a fusion framework tailored for complex sequence prediction in WSN environments. By integrating LSTM’s temporal memory with Transformer’s global dependency modeling, STGNet captures both localized node dynamics and cross-node interactions, effectively modeling spatial correlations and routing variability inherent to WSNs. To improve robustness and adaptability, STGNet leverages TPE-based Bayesian optimization for efficient, automated hyperparameter tuning, and incorporates a SHAP-based interpretability module to quantify the contribution of each sensor or feature dimension—enhancing transparency and trust in model outputs. Extensive experiments on real-world WSN datasets show that STGNet consistently outperforms LSTM, Transformer, and ensemble baselines in prediction accuracy, temporal consistency, and feature sensitivity. These results validate STGNet as a scalable and interpretable solution for environmental monitoring, resource scheduling, and adaptive control in intelligent wireless sensing systems.