Definition of Reward

Hi,
I am a bit confused as per the definition of Reward.
I understood, that the Reward is linked to the state we are in, (S), not to the next State (S’).
But if I look at the Lunar Lander, Reward seems to be linked rather to an action (firing a truster) and/or the next state (S’, landing safely).
Does somebody have a clear definition of Reward?

Hi @mirko.bruhn, this is a rich discussion which I think can be helpful for you in understanding how rewards help in selecting the next state.
Rewards is linked with what next state of lander should be, Like if lander is in a state where moving left earns more reward than moving right, then to maximize the reward the lander will move toward left. Reward is linked with current state but it is calculated by taking account for other states (In case of lander it is Reward at Left End or Right End and steps to reach there).

Hi @ritik5 ,
thank you for your answer, there are actually 2 things confusing me.
One is the definition of reward in the Lunar Lander:

The Lunar Lander environment has the following reward system:

  • Landing on the landing pad and coming to rest is about 100-140 points.*
  • If the lander moves away from the landing pad, it loses reward.*
  • If the lander crashes, it receives -100 points.*
  • If the lander comes to rest, it receives +100 points.*
  • Each leg with ground contact is +10 points.*
  • Firing the main engine is -0.3 points each frame.*
  • Firing the side engine is -0.03 points each frame.*

I feel that actions and reaching a certain state are mixed in this definition? That would mean, that R = f(a,s’), so the reward is a function of next state and also action taken? This was not defined like this in the lecture as far as I remember.

And the second thing about the reward are the indices (timesteps):

An experience tuple is defined like this: (𝑆𝑡,𝐴𝑡,𝑅𝑡,𝑆𝑡+1), but Rt is the reward of reaching St+1? So it should be Rt+1?

And in the Bellman Equation, the reward is defined like this:
𝑦=𝑅+𝛾max𝑎′𝑄(𝑠′,𝑎′;𝑤)
But here the reward R seems to be linked to the current state s. Following the logic above, then here this would be the reward received in the previous timestep, when we reached state s (not s’)?

Hello @mirko.bruhn,

We can’t bound ourselves to one single way of defining Reward. In lecture video, we saw an example that reward is state-dependent and rewards are constant. In the case of Lunar Lander, reward is calculated based on the change of states. They are different.

Here is a relevant discussion that also links to another relevant discussion, and I suggest you to read them before considering how you would like to “exit” this question.

Cheers,
Raymond