DQN vs Q-Function

Chandni_Kausika · August 8, 2022, 8:36am

I have a problem understanding why we need to use Deep Learning to get the value of Q(s, a). We are training this network with our formula for Q(s, a) = R(s) + gamma * max(Q(s’, a’); This means that the neural network is trained to give answers close to our formula, so why can’t we use the formula itself instead of training the network.

Elemento · August 8, 2022, 10:16am

Hey @Chandni_Kausika,
That’s indeed a nice question, and what you are essentially asking is why we are using Deep Q-Learning (DQN) (estimating the Q values using a neural network) instead of Q-Learning (using the formula iteratively to converge to the true values). Now, there are many advantages of Deep Q-Learning over Q-Learning, but perhaps the most prevalent one, at least to the best of my knowledge, is in the problems involving continuous state-action spaces.

When we use the formula iteratively, it only converges to the true values under the assumption that every state-action pair is visited infinitely many times. Now, in mathematics, whenever, we can’t calculate something, we approximate it So, we approximate this assumption, i.e., if a state-action pair is visited a large number of times, we will assume that the converged values are similar to the true values.

However, this assumption is hard to satisfy when there are infinite state-action pairs, for instance, the lunar lander scenario. Even if we consider a single parameter say the x-coordinate for representing the state instead of considering 8 parameters (as being done in the lab), you will observe that there are infinite values for this x-coordinate, and hence, we can’t apply this formula iteratively in this scenario, and that gives way to something know as function approximation, which forms the backbone of Deep Q-Learning.

I am not very familiar with the concept of function approximation yet, but as far as my understanding goes, we try to find a function which takes continuous values (for instance those representing the state and action) and outputs the state or state-action values, and in Deep Q-Learning, our neural network takes the form of the function. Let me know if this helps.

P.S. - If you wanna read more about it, do check out the Reinforcement Learning book by Richard S. Sutton and Andrew G. Barto.

Cheers,
Elemento

rmwkwok · August 8, 2022, 10:31am

I think one primary reason is we simply don’t know Q(s, a), and we want to learn it by exploring. Then it comes to how to “save” what we have learnt, and as @Elemento pointed out, in the case of a continuous state-action space, generalizing the knowledge in a NN should be a smarter choice than having to go through the history itself everytime making a new decision on action.

However, if we know the complete form of Q(s, a) in advance and it is efficient to use, I don’t see why we have to learn a DQN.

Raymond

Chandni_Kausika · August 8, 2022, 10:44am

Thank you for that! It definitely helped, although the concept is still a little hard to grasp. Interestingly though, Andrew G. Barto is a Professor at my University and I will make sure to read that book!

Chandni_Kausika · August 8, 2022, 10:48am

Thank you, that helps!

Elemento · August 8, 2022, 10:50am

That’s just awesome It’s my dream to meet him one day, and I really hope that it will happen some day. All the best to you

Cheers,
Elemento

rmwkwok · August 8, 2022, 11:24am

You are welcome @Chandni_Kausika. If you have questions about RL, besides going to Professor Barto’s room which is the best, I am sure @Elemento and us are happy to discuss with you here.

Topic		Replies	Views
How does the neural network compute the Q function Unsupervised Learning, Recommenders, Reinforcement week-3	3	494	March 21, 2023
State and Action as Input vs State as Input and Q Values as Output Unsupervised Learning, Recommenders, Reinforcement week-3	2	286	March 17, 2024
Question about state value function learning algo Unsupervised Learning, Recommenders, Reinforcement week-3	4	520	April 19, 2023
Unsupervised Learning : Week3 : Learning the state-value function Unsupervised Learning, Recommenders, Reinforcement week-3	7	465	November 3, 2023
Convergene for dqn algorithm Unsupervised Learning, Recommenders, Reinforcement week-3	1	496	August 4, 2022

DQN vs Q-Function

Related topics