Q function training


I have a question about Q function. At first, we randomly guess Q, and we get y from this random Q function. We tried 10000 examples and paired (x,y) trying to update q to fit y, but we use original random q to get y. how is that possible we can get other q function. what is the meaning? please explained clearly thank you! really confused.

Hey there @ryan106001020

Even though the initial Q function is random and inaccurate, but it slowly improves in the process of iteratively updating the Q function.

Here’s how it works:
First, You use a random Q to estimate the target values (y) based on the rewards and future Q-values. Then, you train the Q function to better approximate these targets and update it, the Q function gets closer to the true Q values, making the agent smarter in the environment.

Hope it helps! Feel free to ask if you need further assistance.

Hi~
the point i don’t understand is that:
since i use inaccurate Q to make assumption for target y, what am I updating for?

𝑄(𝑠,𝑎;𝑤)=y hat
𝑅+𝛾max𝑎′𝑄̂ (𝑠′,𝑎′;𝑤−)=y target
we are calculating the loss between y hat and y target and updating Q function

is that correct?
thank you for replying.

Yes, you are correct! You’re updating the Q by minimizing the difference (loss) between the predicted Q value and the target value (y_hat and y). The target y is based on the reward and the best estimate of future Q values (using γ max𝑎′𝑄̂(𝑠′,𝑎′)) and over n iterations, the updates improve the Q function’s ability to predict accurate values.