Reinforcement learning introduction quiz - helicopter return question


One question in the quiz refers to the return of an algorithm flying a helicopter.
The question describes being in a certain state, and then taking three actions, with different rewards.
The correct answer is not to apply the discount factor to the first reward.
In the videos though, it is shown that the current state reward yields no discount, but the first step already has a discount factored in.
Why the discrepancy? does it matter?


This is also the case with the Bellman Equation, which doesn’t apply the discount for the current state, but applies it to the first action.

Hi @Yuri_Dulkin,

  1. Can you share the name of the video?
  2. Can you share the timestamp of the video that shows the example you are talking about?
  3. Can you tell me the discrepancy: what is your “expected value” for “what thing”, and what is the “video’s value” for “that thing”?