Let’s say a person is fishing, and here comes a moment that the person needs to decide whether to reel or not. What do you think about how the person will come to the decision? Does that person know the future before making the decision? Or, does that person estimate the future before making the decision.
Note the difference between “knowing the future” and “estimating the future”. The former is a gift and is superpower, whereas the latter is just experience.
Indeed, we cannot foresee anyway. So we estimate, but based on what? I am not a fishing expert actually, but in the fishing example, I would say, (1) whether the fishing line has moved, (2) how long it has moved, and perhaps many more that you can think of. They are all states. They are all states that we have observed. None of them comes from the future - we don’t know any future states. The thing is, given the current state s , what is the Q value if the person take the action a (reel or not reel).
The Q function, in the fishing example, is essentially the brain of that person who can process the states and able to tell whether it is going to be more rewarding if reel or if not reel. Agree?
We can call them feedback from the environment. We can also call it state.
And Q function is nothing more than an encapsulation of past experience about fishing, right? We never make decision based on the future, but based on what we have learnt in the past. If that person is very experienced in fishing, then that person’s brain (Q-function) is going to give a better prediction to the Q value of reeling, and the Q value of not reeling. From these two values, the person only needs to pick the action that rewards more.
Let’s be honest, none of us can make decision based on the future. We can at most say, we believe the future is going to be that way based on our past experience. We never really know the future. We need to accept that we don’t know the future. Otherwise, we cannot move on.