Reinforcement Learning Intial State and reward

Manoj_Agrawal · March 17, 2023, 9:20pm

Reference to “Return in Reinforcement Learning” video, if the rover is at position 4 and it moves left to come to position 1, why do we add a reward 0 for position 4? Shouldn’t the rover get 0 for coming to position 3, 0 for coming to position 2 and 100 for coming to position 1?

TMosh · March 18, 2023, 1:34am

Please give the time mark for where you captured that image. It’s important because there are a lot of different examples given.

Manoj_Agrawal · March 20, 2023, 3:44pm

At around 1:23 but, it doesn’t matter because this is a running example in the course and in every calculation the reward for the current state is being added to the total return. My question, if the Rower is currently at location 1 and has to move to location 6 is the return calculated as
100 + 0 + 0 + 0 + 0 + 40 = 140
or 0 + 0 + 0 + 0 + 40 = 40?
From the example calculations in the course, it seems it should be option 1 (140) which is confusing me because if the rower has to return from position 6 to position 1 now, we will again be adding the reward for the current state (40) which is double counting.
Attaching another screen shot that makes the calculations clear.

TMosh · March 21, 2023, 5:55am

In the slide in your first message, the numbers are in the order that they’re computed - starting from state 4 and proceeding to state 1.

They’re not in the same order as the states in the diagram above the numbers.

You have to pick this up from listening to the narration carefully.

Manoj_Agrawal · March 21, 2023, 6:41am

Thanks for the response, but my question is that why are we adding the reward for state 4 (starting or the current state) when the rower is already there? Shouldn’t we be adding the rewards only for states that the rower moves to, namely 3, 2 and 1?

TMosh · March 21, 2023, 7:30am

There is no reward for state 4. The 100 point reward is for moving from state 2 to state 1.
Ignore the fact that “100” is written below the numbers 4 and 5. That’s not what it correlates to.

Listen carefully to what Andrew says in the lecture.

Manoj_Agrawal · March 21, 2023, 3:13pm

Thanks for your patience
Yes, I understand that reward 100 is for state 1 and the position of the numbers on the figure is not the problem but if you watch the video https://www.coursera.org/learn/unsupervised-learning-recommenders-reinforcement-learning/lecture/UrcMA/mars-rover-example
from 2:57, Prof Andrew says
" Let’s look at some examples of what might happen if the robot was to go left, starting from state 4. Then initially starting from state 4, it will receive a reward of zero, and after going left, it gets to state 3, where it receives again a reward of zero. Then it gets to state 2, receives the reward is 0, and finally just to state 1, where it receives a reward of 100."

why is the rover getting a reward (zero in this case) for starting from state 4?

If you see the explanation for question 3 from the quiz (screen shot attached),

it says “we get zero reward in state 5”. One cannot get this answer without considering the reward for state 5 as
0x0.25^0 + 0x0.25^1 + 0x0.25^2 + 40x0.25^3 = 0.625
the first element (0*0.25^0) in this equation is for state 5 and that’s where my confusion lies “why are we adding reward for the starting state?”

TMosh · March 21, 2023, 5:07pm

At the point in the lectures where you got the screen captures, the concept of the Bellman Equation and discounted rewards had not yet been introduced.

I recommend you review the lectures again in detail.

Manoj_Agrawal · March 21, 2023, 5:36pm

My question is very simple and is conceptual, if rover moves from state 4 to state 1 , should the reward for state 4 (which is zero in this example) be considered for the return calculation? In all the examples that I have posted above, it seems that the reward for state 4 is being considered.

If we forget everything, could you please tell me what should be the return if the rover moves from state 1 to state 6 assuming discount factor of 1?

TMosh · March 22, 2023, 3:40am

From state 1 to 6 in which image? The one with the discount, or the one in your first two posts without the discount?

Manoj_Agrawal · March 22, 2023, 4:19am

I had mentioned a discount factor of 1 but no worries. I checked the videos again and seems that reward for the current state is always added to the return calculations. thanks

Topic		Replies	Views
Question on discounting Unsupervised Learning, Recommenders, Reinforcement week-3	8	482	November 7, 2022
Reinforcement - Terminology of "first step" Unsupervised Learning, Recommenders, Reinforcement week-3	5	323	December 8, 2023
Possible solution error in Reinforcement Learning Quiz? Unsupervised Learning, Recommenders, Reinforcement week-3	11	575	January 20, 2023
Week 3 lecture video has error _ State_action Value Function definition Unsupervised Learning, Recommenders, Reinforcement week-3	5	449	August 13, 2024
The Return in Reinforcement Learning Unsupervised Learning, Recommenders, Reinforcement week-3	2	569	September 15, 2022

Reinforcement Learning Intial State and reward

Related topics