Given the severe harm caused by vulnerabilities, vulnerability mining targeting the software supply chain has become a key focus for security researchers. As an effective technique for automated vulnerability mining in the software supply chain, this paper applies reinforcement learning algorithms to fuzz testing technology. It models the fuzz testing process using reinforcement learning and then employs the DDPG reinforcement learning algorithm to select strategies and solve the modeled problem. Additionally, this paper proposes an automated software vulnerability repair method based on large models, enhancing the model’s vulnerability repair performance across three stages: input, model itself, and output. Experimental results show that the target site coverage speed of this paper’s vulnerability detection method is 3.43 times and 1.45 times faster than the baseline method, and the discovery speed of real target vulnerabilities is 3.67 times and 1.84 times faster, demonstrating superior software supply chain vulnerability detection capabilities. Compared to other methods, the vulnerability repair method proposed in this paper achieves optimal repair effects for different vulnerability types and vulnerability program lengths, with recall rates improved by 39.38% to 142.49% in comparative experiments. Therefore, the vulnerability repair method proposed in this paper demonstrates superior vulnerability repair performance.