Inconsistent definition for the Bellman equations

rmwkwok · August 17, 2022, 10:42pm

Hello Kaitian,

Congratulations for making it to the last lab. This RL lab is my favourite lab of the specialization, and so I also read the underlying code and indeed the reward returned from the .step function considers both the current and the next state, so I agree that it’s more like a reward from the next state.

However, this also brings in an interesting point that in this case, a state doesn’t always has the same reward, because we always need to know the two consecutive states to calculate the reward. How would we assign the reward? Is it to the current state, or to the next state? Sounds like it can be controversial, doesn’t it?

But let’s put this aside for a while and look at another fact, which is that the loss function we train the DQN doesn’t have to be the Bellman equations, no matter how indeed our lab’s loss function looks so like the Bellman equation. Now, with such relaxation of using just any form of loss function, the inconsistency should be gone, right? I personally like the idea of using the loss function the lab is using because I want my DQN to learn what rewards to get by taking this action at this state. That’s it. That’s my rational for accepting the loss function, while being happy with the bellman equation.

Raymond

Topic		Replies	Views
Unsupervised Learning: Bellman Equation example looks incorrect Unsupervised Learning, Recommenders, Reinforcement week-3	4	77	September 22, 2024
Bellman Equation Unsupervised Learning, Recommenders, Reinforcement week-3	3	593	August 31, 2022
Notes on the Bellman equation Unsupervised Learning, Recommenders, Reinforcement week-3	1	49	October 15, 2024
Problem of the final lab Unsupervised Learning, Recommenders, Reinforcement week-3	3	502	February 14, 2023
Quiz problem in bellman Unsupervised Learning, Recommenders, Reinforcement week-3	1	471	March 17, 2023

Inconsistent definition for the Bellman equations

Related topics