Abstract:Aiming at the problem of slow convergence speed of jamming decision method based on reinforcement learning, a jamming decision algorithm with selfadaptive planning steps based on Dyna-Q algorithm is proposed. On the premise of ensuring the effectiveness of the jamming strategy, the convergence speed of the reinforcement learning algorithm is improved, so that the algorithm can learn the optimal jamming strategy at a faster speed. The experimental and simulation results show that the algorithm can realize the real-time and effective jamming of multi-function radar, and can also be extended to other reinforcement learning applications, which has a certain reference value.