As multi-track music creation jobs become more complicated, it has become more difficult to capture the interdependencies between tracks and produce high-quality, music theory-compliant music. In order to address the issues of co-generation and music theory restrictions in multi-track music production, we provide in this study a novel framework, M2S-GAN, based on Generative Adversarial Networks (GAN) and Transformer. In order to achieve collaborative multi-track generation, we present the Cross-Track Attention method, which uses cross-track self-attentive learning to capture the intricate relationships and long-term dependencies between various tracks. To guarantee that the produced music satisfies the standards of conventional music theory in terms of harmony, melody, and rhythm, the model also integrates music theory rules to direct the production process in the form of mathematical models. M2S-GAN improves the diversity and caliber of the generated music in addition to increasing generation stability through the skillful architecture of several generating and discriminative networks. According to experimental results, the suggested model performs better than current approaches in terms of the generated multitrack music’s quality, stability, and musical rationality. It can also continue to produce outstanding results across a variety of datasets and assessment criteria. Our work offers a fresh perspective on multi-track music generating and robust support for automated music generation and creation.