Hi,
One question in the quiz refers to the return of an algorithm flying a helicopter.
The question describes being in a certain state, and then taking three actions, with different rewards.
The correct answer is not to apply the discount factor to the first reward.
In the videos though, it is shown that the current state reward yields no discount, but the first step already has a discount factored in.
Why the discrepancy? does it matter?
Thanks,
Yuri