Reinforcement - Terminology of "first step"

Dave_Cotton · December 7, 2023, 4:49am

Given this quiz question:

“You are using reinforcement learning to fly a helicopter. Using a discount factor of 0.75, your helicopter starts in some state and receives rewards -100 on the first step, -100 on the second step, and 1000 on the third and final step (where it has reached a terminal state). What is the return?”

My confusion: If I have an agent that starts in a particular state….

a. Why is the terminology for starting in that state “receives rewards -100 on the first step”? In my mind, it’s not a step - it’s a state - and rewards should be calculated against potential actions. Is that just the way the algorithm is defined? It is consistent with the lecture.
b. Why, logically, do we place a reward on that starting state? Is it because the agent may have been placed in an advantageous place to start (i.e the mars rover landed on a pot of gold)?

Thanks guys!

TMosh · December 7, 2023, 6:17am

If I recall correctly, you don’t get a reward for the state you are currently in. You get a reward for moving to the next state.

Dave_Cotton · December 7, 2023, 5:02pm

@TMosh that sounds logical to me, but I looked back and the lecture and matching that up with the quiz and it seems that there is a concept of an undiscounted reward in the starting state. I’ll attach my quiz result which reinforces this.

The best rationalization I can think of (big newbie alert) is that there should be a value associated with the “action” of staying in that state. This would be relevant if other available actions would result in lesser (potentially negative) returns. Example, the mars rover lands on at the top of a pointy plateau and would tumble if it moved in any direction. I may be waaaay off here so feel free to zap this and redirect me.

TMosh · December 7, 2023, 5:45pm

I’ll review the lecture and reply further.

Dave_Cotton · December 8, 2023, 5:33pm

Think I’m good. This concept is touched upon in a subsequent lecture (Bellman Equation one), though with more intuitive terminology. Andrew states that the agent gets a reward “right away” which would apply to any state including the initial state…

@TMosh and others - please lmk if you think this is correct for future learners.

TMosh · December 8, 2023, 5:48pm

Thanks for your report.

Topic		Replies	Views
Question on discounting Unsupervised Learning, Recommenders, Reinforcement week-3	8	481	November 7, 2022
Possible solution error in Reinforcement Learning Quiz? Unsupervised Learning, Recommenders, Reinforcement week-3	11	569	January 20, 2023
Reinforcement learning introduction quiz - helicopter return question Unsupervised Learning, Recommenders, Reinforcement week-3	2	483	September 10, 2022
Reinforcement Learning Intial State and reward Unsupervised Learning, Recommenders, Reinforcement week-3	10	509	March 22, 2023
Definition of Reward Unsupervised Learning, Recommenders, Reinforcement week-3	4	650	October 22, 2022

Reinforcement - Terminology of "first step"

Related topics