How does R(s) reward an action (e.g. firing engines) which is not part of the state?

mosofsky · August 18, 2022, 1:00am

The lectures said that the reward is a function of the state (“Mars rover example” @ time 5:24) but in the lunar landing assignment (C3_W3_A1_Assignment), the two last rewards in the list concern firing engines which are actions not state. Here are excerpts from the assignment that show the state vector does not include firing engines:

3.2 Observation Space

The agent’s observation space consists of a state vector with 8 variables:

Its (𝑥,𝑦) coordinates. The landing pad is always at coordinates (0,0).
Its linear velocities (𝑥˙,𝑦˙).
Its angle 𝜃.
Its angular velocity 𝜃˙.
Two booleans, 𝑙 and 𝑟, that represent whether each leg is in contact with the ground or not.

3.3 Rewards

The Lunar Lander environment has the following reward system:

Landing on the landing pad and coming to rest is about 100-140 points.
If the lander moves away from the landing pad, it loses reward.
If the lander crashes, it receives -100 points.
If the lander comes to rest, it receives +100 points.
Each leg with ground contact is +10 points.
Firing the main engine is -0.3 points each frame. <== THIS IS AN ACTION NOT A STATE
Firing the side engine is -0.03 points each frame. <== THIS IS AN ACTION NOT A STATE

Since firing engines are not part of the state, how does R(s) reward engine firing?

rmwkwok · August 18, 2022, 1:10am

Hello Michael @mosofsky,

We need to remind ourselves that reward is solely decided by the environment and not subject to limitations of any theory or equation. The lunar lander environment is such a case that it decides to reward negative points to engine firing. This code line decides that the lunar lander’s reward is calculated by how the state changes, and the next few lines are about the rewards for engine firing. Actually many of the items in the list of rewards are actions or change of states, but not a state.

We can’t change the environment, but how to model reward or how to use reward is a question for us to consider. Did you check out how reward is being used in the assignment? This is a relevant discussion.

Raymond

Topic		Replies	Views
Definition of Reward Unsupervised Learning, Recommenders, Reinforcement week-3	4	659	October 22, 2022
Confusion about some aspects of reinforcement learning Unsupervised Learning, Recommenders, Reinforcement week-3	1	500	August 22, 2022
Lunar lander reward Unsupervised Learning, Recommenders, Reinforcement week-3	10	327	November 12, 2023
Question on discounting Unsupervised Learning, Recommenders, Reinforcement week-3	8	482	November 7, 2022
C3_W3_A1_Assignment: Reward using the .step() function Unsupervised Learning, Recommenders, Reinforcement week-3	5	574	November 1, 2022

How does R(s) reward an action (e.g. firing engines) which is not part of the state?

3.2 Observation Space

3.3 Rewards

Related topics