Q function training

ryan106001020 · August 20, 2024, 6:10am

I have a question about Q function. At first, we randomly guess Q, and we get y from this random Q function. We tried 10000 examples and paired (x,y) trying to update q to fit y, but we use original random q to get y. how is that possible we can get other q function. what is the meaning? please explained clearly thank you! really confused.

Alireza_Saei · August 20, 2024, 6:36am

Hey there @ryan106001020

Even though the initial Q function is random and inaccurate, but it slowly improves in the process of iteratively updating the Q function.

Here’s how it works:
First, You use a random Q to estimate the target values (y) based on the rewards and future Q-values. Then, you train the Q function to better approximate these targets and update it, the Q function gets closer to the true Q values, making the agent smarter in the environment.

Hope it helps! Feel free to ask if you need further assistance.

ryan106001020 · August 20, 2024, 7:03am

Hi~
the point i don’t understand is that:
since i use inaccurate Q to make assumption for target y, what am I updating for?

𝑄(𝑠,𝑎;𝑤)=y hat
𝑅+𝛾max𝑎′𝑄̂ (𝑠′,𝑎′;𝑤−)=y target
we are calculating the loss between y hat and y target and updating Q function

is that correct?
thank you for replying.

Alireza_Saei · August 20, 2024, 7:12am

Yes, you are correct! You’re updating the Q by minimizing the difference (loss) between the predicted Q value and the target value (y_hat and y). The target y is based on the reward and the best estimate of future Q values (using γ max𝑎′𝑄̂(𝑠′,𝑎′)) and over n iterations, the updates improve the Q function’s ability to predict accurate values.

Topic		Replies	Views
How does the Q-Learning Algorithm actually learn? Unsupervised Learning, Recommenders, Reinforcement week-module-3	18	555	December 5, 2023
Confused about reniforcement learning Unsupervised Learning, Recommenders, Reinforcement week-module-3	1	226	March 26, 2024
Confusion on Target Variable Deep Reinforcement Unsupervised Learning, Recommenders, Reinforcement week-module-3	28	938	September 15, 2022
Where does the information to improve Q come from? Unsupervised Learning, Recommenders, Reinforcement week-module-3	17	767	February 14, 2023
Question about state value function learning algo Unsupervised Learning, Recommenders, Reinforcement week-module-3	4	520	April 19, 2023

Q function training

Related topics