In RL, do the target Q_hat network and Q network parameters eventually converge?

wr200m · January 28, 2024, 8:27pm

Hi, I understand why we have a Q_hat network in order to have stable y_target values for the sake of learning stability, and we update slightly Q_hat’s parameters using the Q’s parameter after each training iterations (at every C time steps) with a soft update.

My question is even though soft update uses a tau << 1, do the two networks’ parameters eventually converge? I don’t think this was discussed in either lectures or labs.

conscell · December 5, 2024, 12:45am

Under certain conditions the parameters of the target \hat{Q} network and the main Q network (parameterized by \theta_\hat{Q} and \theta_Q respectively) will eventually converge.
The soft update rule for the target network is:
\theta_{\hat{Q}} \leftarrow \tau \theta_{Q} + (1 - \tau) \theta_{\hat{Q}},
where \tau \ll 1 is a small positive scalar (e.g. \tau = 0.001).
As you can see from the equation, over time this causes \theta_{\hat{Q}} to approach \theta_{Q}, provided \theta_{Q} stops changing or its changes slow down. This is possible when the learning process stabilizes (the environment is not excessively stochastic, and the agent learns effectively); the loss function is minimized, and the Q-values approach the true expected return. However, in practice they may not fully converge. Continuous environment exploration (via an \epsilon-greedy policy) or high stochasticity in the environment may never stabilize \theta_Q completely.
The focus in reinforcement learning is often on stabilizing training and ensuring convergence of the Q network itself, and \hat{Q} is used to provide a slowly evolving estimate of the expected return. This paper shows that under assumption of the existence of a good DNN approximation to the optimal Q-value function and some other assumptions, the DQN algorithm will converge.

Topic		Replies	Views
Target Network Clarification Unsupervised Learning, Recommenders, Reinforcement week-3	3	766	July 10, 2023
Don't fully understand q_network and target_q_network Unsupervised Learning, Recommenders, Reinforcement week-3	4	386	August 28, 2023
Confusion on Target Variable Deep Reinforcement Unsupervised Learning, Recommenders, Reinforcement week-3	28	937	September 15, 2022
Convergene for dqn algorithm Unsupervised Learning, Recommenders, Reinforcement week-3	1	496	August 4, 2022
Don't understand why we use q_netword & target_q_network Unsupervised Learning, Recommenders, Reinforcement week-3	1	357	September 19, 2023

In RL, do the target Q_hat network and Q network parameters eventually converge?

Related topics