Where does the information to improve Q come from?

Hello Douglas,

Let me ask some clarifying questions:

Can you share a screenshot of the slide which contains the equation? I am not sure about which equation you are referring to. You may find the slides here.

I have a feeling that you are speaking about the Bellman equation which is the sum of a series of rewards discounted by gammas.

However, in the first post of this thread, I suppose you are speaking about the Q-network. Although Q-network and the Bellman equation both speak about the Q-values, they are not equivalent. Which one should we focus on now?

You are referring the information to as the “only one component of the sum”?

current one = reward in the current state?

So is this your hypothesis? Is this what you want to talk about in this post? And how does it relate to Reinforcement learning?

For me to just look at this statement, not in the context of RL, I would agree that if the sum of a series of number is positive, the chance is higher for me to pick a positive number from the series, if the numbers are gaussian distributed.

Let me know :slight_smile:

Cheers,
Raymond