The lectures said that the reward is a function of the state (“Mars rover example” @ time 5:24) but in the lunar landing assignment (C3_W3_A1_Assignment), the two last rewards in the list concern firing engines which are actions not state. Here are excerpts from the assignment that show the state vector does not include firing engines:
3.2 Observation Space
The agent’s observation space consists of a state vector with 8 variables:
- Its (𝑥,𝑦) coordinates. The landing pad is always at coordinates (0,0).
- Its linear velocities (𝑥˙,𝑦˙).
- Its angle 𝜃.
- Its angular velocity 𝜃˙.
- Two booleans, 𝑙 and 𝑟, that represent whether each leg is in contact with the ground or not.
3.3 Rewards
The Lunar Lander environment has the following reward system:
- Landing on the landing pad and coming to rest is about 100-140 points.
- If the lander moves away from the landing pad, it loses reward.
- If the lander crashes, it receives -100 points.
- If the lander comes to rest, it receives +100 points.
- Each leg with ground contact is +10 points.
- Firing the main engine is -0.3 points each frame. <== THIS IS AN ACTION NOT A STATE
- Firing the side engine is -0.03 points each frame. <== THIS IS AN ACTION NOT A STATE
Since firing engines are not part of the state, how does R(s) reward engine firing?