i don’t really understand why it’s wrong
i calculate the state of 5
q(5,<-)=0+ 0.25*0 + 0.25^2 *0 +0.25^3 *0 + 0.25^4 *100
Hi @Ibrahim_Mustafa,
Remember that the Bellman equation says that you take the specified starting action to move from the first state, but then you behave optimally after that.
For state 4, the optimal action is to move right, not left.
2 Likes