Discrepancy of return values for the same model

Rose_LUO · January 21, 2024, 4:37pm

discrepancy of return values for the same model:

in the video ‘The Return in reinforcement learning’, the return value for → are correct:

image1232×453 72.8 KB
in the video ‘Bellman Equation’, the return value for → changed in State 2 and 3, which are wrong:

image1008×371 57.3 KB

Wendy · January 22, 2024, 7:39pm

@Rose_LUO, remember that with the Bellman equation, we are taking the reward for the current position, plus the optimal value for the position the action will take us to. The values you see in the top right corner for each square take that into consideration. For example, the top right corner for square 2 is 12.5 because the optimal value for square 2 if you go right to square 3 would be to turn around and go left from square 3. Left from square 3 has a return of 25, and if you discount that 0.5 * 25 = 12.5. That’s why the return for going right from square 2 is 12.5. 0 for square 2 + 0.5 * 25 from going left from square 3.

Topic		Replies	Views
Unsupervised Learning: Bellman Equation example looks incorrect Unsupervised Learning, Recommenders, Reinforcement week-module-3	4	84	September 22, 2024
Reinforcement learning introduction quiz - helicopter return question Unsupervised Learning, Recommenders, Reinforcement week-module-3	2	483	September 10, 2022
State-action value function example question Unsupervised Learning, Recommenders, Reinforcement week-module-3	5	559	September 27, 2022
Quiz problem in bellman Unsupervised Learning, Recommenders, Reinforcement week-module-3	1	472	March 17, 2023
Bellman Equation Unsupervised Learning, Recommenders, Reinforcement week-module-3	3	602	August 31, 2022

Discrepancy of return values for the same model

Related topics