How does the neural network compute the Q function

Bilel_Djemel · March 20, 2023, 7:44pm

This is the Bellman equation for the Q function,
The ANN computes the Q function randomly at first and then iteratively, using a training set of different states and the corresponding computed Q value, computes a better model that predicts better the Q values.
My question is, how do we know that after many iterations that the output is really an approximation of the true Q value? I can not figure out how the Bellman equation is applied in the ANN.

rmwkwok · March 21, 2023, 12:30am

Hello @Bilel_Djemel

We do not know the True Q function beforehand, so we use a neural network in the hope of modeling it properly. Since we do not know the true Q function, a good way to verify it would be to apply it back in the environment and see if its decisions are well made.

If you go through the assignment of the week, you will see that it uses the same form as the Bellman equation for computing the so-called “y target”. I believe working through an example should give you a more concrete idea of how things are put together.

Cheers,
Raymond

Bilel_Djemel · March 21, 2023, 9:44am

Can’t we use a recursion function?
To calculate the new Q(s’,a’) in the Bellman function we just call the same function but this time a’ and s’ are the parameters this requires use to come up with the value of Q(s",a") which again is computed with the new parameters using the Bellman function and we stop when we reach a terminal state for example.
But I think this method is expansive in terms of complexity

rmwkwok · March 21, 2023, 3:07pm

The assignment of this week has a very good example of using a Neural Network to model the Q function, and how we use it in a simulated Lunar Lander problem. Going through it should give you a concrete idea of how it works, and you will see that there is no need to recursively compute anything.

Raymond

Topic		Replies	Views
Question about state value function learning algo Unsupervised Learning, Recommenders, Reinforcement week-module-3	4	521	April 19, 2023
Unsupervised Learning : Week3 : Learning the state-value function Unsupervised Learning, Recommenders, Reinforcement week-module-3	7	484	November 3, 2023
Neural network on bellman equation Unsupervised Learning, Recommenders, Reinforcement week-module-3	9	78	July 20, 2025
Verifying the correctness of Reinforcement Deep Learning Unsupervised Learning, Recommenders, Reinforcement week-module-3	1	31	June 3, 2025
What helps the Neural Network in the Lunar Lander example improve? Unsupervised Learning, Recommenders, Reinforcement week-module-3	2	297	June 6, 2024

How does the neural network compute the Q function

Related topics