discrepancy of return values for the same model:
-
in the video ‘The Return in reinforcement learning’, the return value for → are correct:
-
in the video ‘Bellman Equation’, the return value for → changed in State 2 and 3, which are wrong:
discrepancy of return values for the same model:
in the video ‘The Return in reinforcement learning’, the return value for → are correct:
in the video ‘Bellman Equation’, the return value for → changed in State 2 and 3, which are wrong:
@Rose_LUO, remember that with the Bellman equation, we are taking the reward for the current position, plus the optimal value for the position the action will take us to. The values you see in the top right corner for each square take that into consideration. For example, the top right corner for square 2 is 12.5 because the optimal value for square 2 if you go right to square 3 would be to turn around and go left from square 3. Left from square 3 has a return of 25, and if you discount that 0.5 * 25 = 12.5. That’s why the return for going right from square 2 is 12.5. 0 for square 2 + 0.5 * 25 from going left from square 3.