Abstract:In order to solve the problem that high variance leads to the instability of training process and the decline of algorithm performance, a reduction variance deep deterministic policy gradient (RV-DDPG) algorithm is proposed. Through the method of delaying updating the target strategy, the number of errors is reduced and the accumulation of errors is reduced; through the method of smoothing the target strategy, the single-step error is reduced and the variance is stabilized. The RV-DDPG algorithm, the traditional deep deterministic policy gradient algorithm (DDPG) and the widely used asynchronous asynchronous advantage actor (A3C) are applied to Pendulum, Mountain Car Continues and Half Cheetah problems. The experimental results show that RV-DDPG has better convergence and stability, which proves the effectiveness of the algorithm to reduce the variance.