This paper introduces deep reinforcement learning into automated penetration testing to plan and optimize penetration testing supply and defense paths. After modeling the automated penetration problem, the paper simplifies and evaluates the benefits of the DQN algorithm in deep reinforcement learning, finds the optimal penetration path through sample augmentation, and proposes the MASK-SALT-DQN algorithm. Through simulation experiments, the paper verifies the operation and effectiveness of the algorithm. In both simple and complex scenarios, the MASK-SALT-DQN algorithm achieves the fastest runtime speed, significantly enhancing the agent’s learning efficiency. The algorithm provides accurate evaluation criteria for penetration testing path planning results. Compared to penetration testing learning algorithms based on Nature DQN, the MASK-SALT-DQN algorithm demonstrates a higher convergence value in its learning curve, indicating superior convergence performance.