Reinforcement learning introduction quiz - helicopter return question

Yuri_Dulkin · September 10, 2022, 7:09pm

Hi,

One question in the quiz refers to the return of an algorithm flying a helicopter.
The question describes being in a certain state, and then taking three actions, with different rewards.
The correct answer is not to apply the discount factor to the first reward.
In the videos though, it is shown that the current state reward yields no discount, but the first step already has a discount factored in.
Why the discrepancy? does it matter?

Thanks,
Yuri

Yuri_Dulkin · September 10, 2022, 7:29pm

This is also the case with the Bellman Equation, which doesn’t apply the discount for the current state, but applies it to the first action.

rmwkwok · September 10, 2022, 10:20pm

Hi @Yuri_Dulkin,

Can you share the name of the video?
Can you share the timestamp of the video that shows the example you are talking about?
Can you tell me the discrepancy: what is your “expected value” for “what thing”, and what is the “video’s value” for “that thing”?

Raymond

Topic		Replies	Views
Possible solution error in Reinforcement Learning Quiz? Unsupervised Learning, Recommenders, Reinforcement week-module-3	11	596	January 20, 2023
Reinforcement - Terminology of "first step" Unsupervised Learning, Recommenders, Reinforcement week-module-3	5	348	December 8, 2023
Unsupervised Learning: Bellman Equation example looks incorrect Unsupervised Learning, Recommenders, Reinforcement week-module-3	4	103	September 22, 2024
Discrepancy of return values for the same model Unsupervised Learning, Recommenders, Reinforcement week-module-3	1	265	January 22, 2024
State action value function for terminal states Unsupervised Learning, Recommenders, Reinforcement week-module-3	9	451	September 21, 2024

Reinforcement learning introduction quiz - helicopter return question

Related topics