We are using target_q_network to make predictions or get target_y and then we are training the q_network on this new data ,after training the q_network we are updating the target_q_network using Soft Update (according to my understanding )
couldn’t we just done this using 1 nn or am I missing the whole point
Hello @youssef_bayoumi,
You have given a brief description of how the training involves the target Q network, but not yet your understanding on its advantages. Obviously, the existence of the target Q network enables us to do the soft update, and with that in mind, please
-
review the lecture " Algorithm refinement: Mini-batch and soft updates (optional)" starting from ~7:50 to the end, and
-
review the week’s assignment for its section 6.1
During your review, write down a list of advantages and that by itself will be some answers to the question of why we are having 2 NNs that are connected via the soft-update.
If you have any follow-up, please share the list of advantages so we can discuss based on your latest understanding.
Cheers,
Raymond