State Action value function [Coursera Video]

I just finished the video, yet confused by the state action value demonstrated by the professor.

I am wondering, if the current direction is right, should we change the Q(2,right) function as following, since the reward at the right side is 40.

Q(2, right) = 0 + 0*0.5^1 + 0*0.5^2 + 0*0.5^3 + 40*0.5^4

Hello @James_Yu1,

image

The above is the only answer, because we are bounded by this requirement:

image

To behave optimally after moving to grid 3, we will have to keep moving left all the way to grid 1.

Cheers,
Raymond

1 Like

@James_Yu1 However, if you really choose to keep moving right, then you will move to grid 6, then you will finally be rewarded 40 points, however, the definition for Q won’t be changed. Q(2, right) is always 12.5.

1 Like