How does R(s) reward an action (e.g. firing engines) which is not part of the state?

The lectures said that the reward is a function of the state (“Mars rover example” @ time 5:24) but in the lunar landing assignment (C3_W3_A1_Assignment), the two last rewards in the list concern firing engines which are actions not state. Here are excerpts from the assignment that show the state vector does not include firing engines:

3.2 Observation Space

The agent’s observation space consists of a state vector with 8 variables:

  • Its (𝑥,𝑦) coordinates. The landing pad is always at coordinates (0,0).
  • Its linear velocities (𝑥˙,𝑦˙).
  • Its angle 𝜃.
  • Its angular velocity 𝜃˙.
  • Two booleans, 𝑙 and 𝑟, that represent whether each leg is in contact with the ground or not.

3.3 Rewards

The Lunar Lander environment has the following reward system:

  • Landing on the landing pad and coming to rest is about 100-140 points.
  • If the lander moves away from the landing pad, it loses reward.
  • If the lander crashes, it receives -100 points.
  • If the lander comes to rest, it receives +100 points.
  • Each leg with ground contact is +10 points.
  • Firing the main engine is -0.3 points each frame. <== THIS IS AN ACTION NOT A STATE
  • Firing the side engine is -0.03 points each frame. <== THIS IS AN ACTION NOT A STATE

Since firing engines are not part of the state, how does R(s) reward engine firing?

Hello Michael @mosofsky,

We need to remind ourselves that reward is solely decided by the environment and not subject to limitations of any theory or equation. The lunar lander environment is such a case that it decides to reward negative points to engine firing. This code line decides that the lunar lander’s reward is calculated by how the state changes, and the next few lines are about the rewards for engine firing. Actually many of the items in the list of rewards are actions or change of states, but not a state.

We can’t change the environment, but how to model reward or how to use reward is a question for us to consider. Did you check out how reward is being used in the assignment? This is a relevant discussion.

Raymond