Problem of the final lab

why is this need,i mean why should the last cost function be different
please help me

Hi @wgl

The cost function is the Bellman Equation and it’s being used because we are estimating the optimal action-value function.

This is explained in section 6 - Deep Q-Learning of the lab.


thanks,but i am still comfuse about why if episode terminate at j+1 and yj = Rj but not yj=

I think all it’s saying is that if you’re at the last state, there’s no “j+1” possible, so you ignore it.