Reference to “Return in Reinforcement Learning” video, if the rover is at position 4 and it moves left to come to position 1, why do we add a reward 0 for position 4? Shouldn’t the rover get 0 for coming to position 3, 0 for coming to position 2 and 100 for coming to position 1?

Please give the time mark for where you captured that image. It’s important because there are a lot of different examples given.

At around 1:23 but, it doesn’t matter because this is a running example in the course and in every calculation the reward for the current state is being added to the total return. My question, if the Rower is currently at location 1 and has to move to location 6 is the return calculated as

100 + 0 + 0 + 0 + 0 + 40 = 140

or 0 + 0 + 0 + 0 + 40 = 40?

From the example calculations in the course, it seems it should be option 1 (140) which is confusing me because if the rower has to return from position 6 to position 1 now, we will again be adding the reward for the current state (40) which is double counting.

Attaching another screen shot that makes the calculations clear.

In the slide in your first message, the numbers are in the order that they’re computed - starting from state 4 and proceeding to state 1.

They’re not in the same order as the states in the diagram above the numbers.

You have to pick this up from listening to the narration carefully.

Thanks for the response, but my question is that why are we adding the reward for state 4 (starting or the current state) when the rower is already there? Shouldn’t we be adding the rewards only for states that the rower moves to, namely 3, 2 and 1?

There is no reward for state 4. The 100 point reward is for moving from state 2 to state 1.

Ignore the fact that “100” is written below the numbers 4 and 5. That’s not what it correlates to.

Listen carefully to what Andrew says in the lecture.

Thanks for your patience

Yes, I understand that reward 100 is for state 1 and the position of the numbers on the figure is not the problem but if you watch the video https://www.coursera.org/learn/unsupervised-learning-recommenders-reinforcement-learning/lecture/UrcMA/mars-rover-example

from 2:57, Prof Andrew says

" Let’s look at some examples of what might happen if the robot was to go left, starting from state 4. Then initially starting from state 4, it will receive a reward of zero, and after going left, it gets to state 3, where it receives again a reward of zero. Then it gets to state 2, receives the reward is 0, and finally just to state 1, where it receives a reward of 100."

why is the rover getting a reward (zero in this case) for starting from state 4?

If you see the explanation for question 3 from the quiz (screen shot attached),

it says “we get zero reward in state 5”. One cannot get this answer without considering the reward for state 5 as

0x0.25^0 + 0x0.25^1 + 0x0.25^2 + 40x0.25^3 = 0.625

the first element (0*0.25^0) in this equation is for state 5 and that’s where my confusion lies “why are we adding reward for the starting state?”

At the point in the lectures where you got the screen captures, the concept of the Bellman Equation and discounted rewards had not yet been introduced.

I recommend you review the lectures again in detail.

My question is very simple and is conceptual, if rover moves from state 4 to state 1 , should the reward for state 4 (which is zero in this example) be considered for the return calculation? In all the examples that I have posted above, it seems that the reward for state 4 is being considered.

If we forget everything, could you please tell me what should be the return if the rover moves from state 1 to state 6 assuming discount factor of 1?

From state 1 to 6 in which image? The one with the discount, or the one in your first two posts without the discount?

I had mentioned a discount factor of 1 but no worries. I checked the videos again and seems that reward for the current state is always added to the return calculations. thanks