This is the Bellman equation for the Q function,

The ANN computes the Q function randomly at first and then iteratively, using a training set of different states and the corresponding computed Q value, computes a better model that predicts better the Q values.

My question is, how do we know that after many iterations that the output is really an approximation of the true Q value? I can not figure out how the Bellman equation is applied in the ANN.

Hello @Bilel_Djemel

We do not know the True Q function beforehand, so we use a neural network in the hope of modeling it properly. Since we do not know the true Q function, a good way to verify it would be to apply it back in the environment and see if its decisions are well made.

If you go through the assignment of the week, you will see that it uses the same form as the Bellman equation for computing the so-called “y target”. I believe working through an example should give you a more concrete idea of how things are put together.

Cheers,

Raymond

Can’t we use a recursion function?

To calculate the new Q(s’,a’) in the Bellman function we just call the same function but this time a’ and s’ are the parameters this requires use to come up with the value of Q(s",a") which again is computed with the new parameters using the Bellman function and we stop when we reach a terminal state for example.

But I think this method is expansive in terms of complexity

The assignment of this week has a very good example of using a Neural Network to model the Q function, and how we use it in a simulated Lunar Lander problem. Going through it should give you a concrete idea of how it works, and you will see that there is no need to recursively compute anything.

Raymond