Convergene for dqn algorithm

Theo2 · August 4, 2022, 10:20pm

I’m having some trouble understanding why the dqn algorithm converges towards the true state action value function. After reading some blog posts (especially this one(although this one discusses Q-tables not dqn but it feels as if the answer should be quite similar): Reinforcement Learning Explained Visually (Part 4): Q Learning, step-by-step | by Ketan Doshi | Towards Data Science)

I get the sense that it’s in the terminal states that the dqn algorithm starts to get more accurate approximations. With Q(s, a) = R(s) + gamma * max((Q(s’,a’)), if S happens to be a terminal state do we then need to define Q(s,a) = R(s) so that the terminal Q value accuracy “gets updated with solely real reward data and no estimated values”?

So basically my question is: is it in the terminal states that the dqn algorithm starts getting better at approximating the q values (aka do we have to have terminal states for the algorithm to work?), and if so do we need to set Q(s,a) = R(s) whenever s is a terminal state?

rmwkwok · August 4, 2022, 10:48pm

First, by definition, when you are at the terminal state, you have no other choice but Q(s,a) = R(s) because there is no more action to be taken, because you are at the terminal state.

Second, the DQN is getting improved over training steps, and the improvement happens not only at the terminal state. Also, IF it was true that there was improvement ONLY at the terminal state, it was not going to help us either because we need to train a DQN which can work at all other states. Let’s remember that DQN is state dependent.

So, each time we train the DQN, the DQN should improve and the improvement is accumulated over many times of training.

Topic		Replies	Views
State action value function for terminal states Unsupervised Learning, Recommenders, Reinforcement week-3	9	416	September 21, 2024
DQN vs Q-Function Unsupervised Learning, Recommenders, Reinforcement week-3	6	544	August 8, 2022
In RL, do the target Q_hat network and Q network parameters eventually converge? Unsupervised Learning, Recommenders, Reinforcement week-3	1	288	December 5, 2024
Verifying the correctness of Reinforcement Deep Learning Unsupervised Learning, Recommenders, Reinforcement week-3	1	27	June 3, 2025
Question about state value function learning algo Unsupervised Learning, Recommenders, Reinforcement week-3	4	520	April 19, 2023

Convergene for dqn algorithm

Related topics