I have a problem understanding why we need to use Deep Learning to get the value of Q(s, a). We are training this network with our formula for Q(s, a) = R(s) + gamma * max(Q(s’, a’); This means that the neural network is trained to give answers close to our formula, so why can’t we use the formula itself instead of training the network.

Hey @Chandni_Kausika,

That’s indeed a nice question, and what you are essentially asking is why we are using Deep Q-Learning (DQN) (*estimating the Q values using a neural network*) instead of Q-Learning (*using the formula iteratively to converge to the true values*). Now, there are many advantages of Deep Q-Learning over Q-Learning, but perhaps the most prevalent one, at least to the best of my knowledge, is in the problems involving **continuous state-action spaces**.

When we use the formula iteratively, it only converges to the true values under the assumption that every state-action pair is visited infinitely many times. Now, in mathematics, whenever, we can’t calculate something, we approximate it So, we approximate this assumption, i.e., if a state-action pair is visited a large number of times, we will assume that the converged values are similar to the true values.

However, this assumption is hard to satisfy when there are infinite state-action pairs, for instance, the lunar lander scenario. Even if we consider a single parameter say the **x-coordinate** for representing the state instead of considering 8 parameters (*as being done in the lab*), you will observe that there are infinite values for this x-coordinate, and hence, we can’t apply this formula iteratively in this scenario, and that gives way to something know as **function approximation**, which forms the backbone of Deep Q-Learning.

I am not very familiar with the concept of function approximation yet, but as far as my understanding goes, we try to find a function which takes continuous values (*for instance those representing the state and action*) and outputs the state or state-action values, and in Deep Q-Learning, our neural network takes the form of the function. Let me know if this helps.

P.S. - If you wanna read more about it, do check out the **Reinforcement Learning** book by **Richard S. Sutton and Andrew G. Barto**.

Cheers,

Elemento

I think one primary reason is we simply don’t know Q(s, a), and we want to learn it by exploring. Then it comes to how to “save” what we have learnt, and as @Elemento pointed out, in the case of a continuous state-action space, generalizing the knowledge in a NN should be a smarter choice than having to go through the history itself everytime making a new decision on action.

However, if we know the complete form of Q(s, a) in advance and it is efficient to use, I don’t see why we have to learn a DQN.

Raymond

Thank you for that! It definitely helped, although the concept is still a little hard to grasp. Interestingly though, Andrew G. Barto is a Professor at my University and I will make sure to read that book!

Thank you, that helps!

That’s just awesome It’s my dream to meet him one day, and I really hope that it will happen some day. All the best to you

Cheers,

Elemento

You are welcome @Chandni_Kausika. If you have questions about RL, besides going to Professor Barto’s room which is the best, I am sure @Elemento and us are happy to discuss with you here.