Definition of Reward

mirko.bruhn · October 21, 2022, 1:23pm

Hi,
I am a bit confused as per the definition of Reward.
I understood, that the Reward is linked to the state we are in, (S), not to the next State (S’).
But if I look at the Lunar Lander, Reward seems to be linked rather to an action (firing a truster) and/or the next state (S’, landing safely).
Does somebody have a clear definition of Reward?

ritik5 · October 21, 2022, 3:38pm

Hi @mirko.bruhn, this is a rich discussion which I think can be helpful for you in understanding how rewards help in selecting the next state.
Rewards is linked with what next state of lander should be, Like if lander is in a state where moving left earns more reward than moving right, then to maximize the reward the lander will move toward left. Reward is linked with current state but it is calculated by taking account for other states (In case of lander it is Reward at Left End or Right End and steps to reach there).

mirko.bruhn · October 21, 2022, 4:27pm

Hi @ritik5 ,
thank you for your answer, there are actually 2 things confusing me.
One is the definition of reward in the Lunar Lander:

The Lunar Lander environment has the following reward system:

Landing on the landing pad and coming to rest is about 100-140 points.*
If the lander moves away from the landing pad, it loses reward.*
If the lander crashes, it receives -100 points.*
If the lander comes to rest, it receives +100 points.*
Each leg with ground contact is +10 points.*
Firing the main engine is -0.3 points each frame.*
Firing the side engine is -0.03 points each frame.*

I feel that actions and reaching a certain state are mixed in this definition? That would mean, that R = f(a,s’), so the reward is a function of next state and also action taken? This was not defined like this in the lecture as far as I remember.

mirko.bruhn · October 21, 2022, 4:35pm

And the second thing about the reward are the indices (timesteps):

An experience tuple is defined like this: (𝑆𝑡,𝐴𝑡,𝑅𝑡,𝑆𝑡+1), but Rt is the reward of reaching St+1? So it should be Rt+1?

And in the Bellman Equation, the reward is defined like this:
𝑦=𝑅+𝛾max𝑎′𝑄(𝑠′,𝑎′;𝑤)
But here the reward R seems to be linked to the current state s. Following the logic above, then here this would be the reward received in the previous timestep, when we reached state s (not s’)?

rmwkwok · October 22, 2022, 2:21am

Hello @mirko.bruhn,

We can’t bound ourselves to one single way of defining Reward. In lecture video, we saw an example that reward is state-dependent and rewards are constant. In the case of Lunar Lander, reward is calculated based on the change of states. They are different.

Here is a relevant discussion that also links to another relevant discussion, and I suggest you to read them before considering how you would like to “exit” this question.

Cheers,
Raymond

Topic		Replies	Views
How does R(s) reward an action (e.g. firing engines) which is not part of the state? Unsupervised Learning, Recommenders, Reinforcement week-3	1	593	August 18, 2022
Reinforcement - Terminology of "first step" Unsupervised Learning, Recommenders, Reinforcement week-3	5	320	December 8, 2023
Confusion about some aspects of reinforcement learning Unsupervised Learning, Recommenders, Reinforcement week-3	1	497	August 22, 2022
Question on discounting Unsupervised Learning, Recommenders, Reinforcement week-3	8	480	November 7, 2022
Lunar Lander Reward Function Unsupervised Learning, Recommenders, Reinforcement week-3	1	238	March 7, 2024

Definition of Reward

Related topics