I don’t fully understand q_network and target_q_network.
It says we don’t want y target to change on every iteration. But target_q_network’s weights are updated on every iteration. So y target is changing on every iteration? Should it change or not?
On every iteration, a gradient descent is performs on q_network’s weights, and a soft update is performed on target_q_network’s weights. Does it mean q_network’s weights are updated more sharply on every iteration, but target_q_network’s weights are updated more softly on every iteration? Then we use softly updated target_q_network to calculate target y, and sharply updated q_network to train?
It changes whenever the target Q Network gets updated, so it does change.
Rather than saying “To avoid this”, I think it is more proper to say “To avoid it from changing too fast”.
Yes, softly updated TQN to get target y, for training the sharply updated QN.
Thanks for your confirmation!
I think now I get it!
Today I completed the Machine Learning Specialization! I want to thank you @rmwkwok ! You replied quickly and answered questions in detailed explanations and taught me how to sometimes find answers by trying things out myself(e.g., how to understand numpy doc). Among mentors, you helped me the most! I am sure many students feel the same way! Thank you for all the support!
Congratulations on your achievement, @Jinyan_Liu! I hope I will see you again here in the future!