Because of the y target value getting change on every iteration we came up with the concept of separate network called Target network. Our doubt is,
It will be problem right because for every C time steps, if we copy main network values to target network then target Y wont be remains constant right. Target Y still be getting changes
Also due to copying process, Qnew = Q(s,a) so MSE will be always be lower right on every iterations. It should be not be the case right sir.
Hello @Anbu ,
Thanks a lot for the question. I will do my best to help you understand what a target network is in my reply.
The target network is a separate neural network that is used to estimate the target values for the Q-learning update rule. It is a copy of the main network, but its parameters are updated less frequently, which helps stabilize the learning process.
Using a single neural network for both estimating the current Q-values and updating the target Q-values can lead to instability in the learning process. This is because the network’s parameters are constantly changing, causing the target values to shift as well. To address this issue, the concept of a target network is introduced.
The target network is a separate neural network that is periodically updated with the parameters of the main Q-network. This means that the target values used for the Q-learning update rule remain more stable, allowing for a more stable learning process. For example, consider a reinforcement learning problem where an agent is learning to navigate a maze. The agent uses a Q-network to estimate the Q-values for each possible action in its current state. To update the Q-values, the agent also needs to estimate the target Q-values for the next state. Instead of using the same Q-network for this purpose, the agent uses a separate target network, which is updated less frequently. This helps stabilize the learning process and allows the agent to learn more effectively.
In summary, a target network is a separate neural network used in deep reinforcement learning algorithms to stabilize the learning process. It is a copy of the main Q-network, but its parameters are updated less frequently, providing more stable target values for the Q-learning update rule.
Please feel free to post a followup question if you feel uncertain about what a target network is.
Best,
Can