Hi all, I’ve finished the course… while trying to understand the training algorithms for RL, I came across the soft update step which transfers the weights from the q model to the target model after back prop. I’m perplexed by the formulae used for doing this. As the video lecture made no mention of this formulae, could someone please advice me ? why is not just a simple full transfer of weights from q network to target network but instead uses this funny TAU? here’s the function which does the soft update in utils.py:
def update_target_network(q_network, target_q_network):
for target_weights, q_net_weights in zip(target_q_network.weights, q_network.weights):
target_weights.assign(TAU * q_net_weights + (1.0 - TAU) * target_weights)
TAU us defined as:
TAU = 1e-3 # soft update parameter
I don’t quite understand why it’s not just a full transfer of the weights over to target… please advice me… merci beaucoup