Hello Michael @mosofsky,
We need to remind ourselves that reward is solely decided by the environment and not subject to limitations of any theory or equation. The lunar lander environment is such a case that it decides to reward negative points to engine firing. This code line decides that the lunar lander’s reward is calculated by how the state changes, and the next few lines are about the rewards for engine firing. Actually many of the items in the list of rewards are actions or change of states, but not a state.
We can’t change the environment, but how to model reward or how to use reward is a question for us to consider. Did you check out how reward is being used in the assignment? This is a relevant discussion.
Raymond