Hi @wgl
The cost function is the Bellman Equation and it’s being used because we are estimating the optimal action-value function.
This is explained in section 6 - Deep Q-Learning of the lab.
Sam
thanks,but i am still comfuse about why if episode terminate at j+1 and yj = Rj but not yj=
Rj+rmaxQ
I think all it’s saying is that if you’re at the last state, there’s no “j+1” possible, so you ignore it.