With the development of digital technology, digital image art as an emerging art form is reshaping people’s perception of space and time. Artificial intelligence technology provides a key impetus for digital image art, revolutionizing the means of artistic creation. This study explores the expression of time consciousness and symbolic space in digital image art in the digital image era, and proposes an interpretation method based on computer vision technology. The purpose of the study is to enhance the spatio-temporal expression of digital image art through computer vision technology and realize the effective integration of time consciousness and symbolic space. Methodologically, the article proposes a dual-stream spatio-temporal fusion algorithm for digital images based on Swin Transformer, which is divided into three modules, namely, temporal feature extraction network, spatial feature extraction network, and fusion network, with information optimization through the CBAM module, and feature processing through the RB module, and ultimately realizes the deep-learning-driven creation of digital image art. The results show that the proposed algorithm performs well in the visual interaction of digital image art, and its root mean square error, peak signal-to-noise ratio and structural similarity reach 1.217, 46.841 and 0.943, respectively, which are far superior to the image segmentation algorithm, the cyclic differential filtering algorithm, and the focusing shape restoration algorithm with robust focusing volume regularization. In addition, the running time of this algorithm is only 2.07 seconds, which is more than 50% shorter than other algorithms. The Swin Transformer-based dual-stream spatio-temporal fusion algorithm for digital images provides technical support for the expression of time consciousness and symbolic space in digital image art, which can effectively meet the requirements of digital image art design, promote the deep integration of digital image art and computer vision technology, and provide users with a more wonderful visual experience.