Hi, I understand why we have a Q_hat network in order to have stable y_target values for the sake of learning stability, and we update slightly Q_hat’s parameters using the Q’s parameter after each training iterations (at every C time steps) with a soft update.

My question is even though soft update uses a tau << 1, do the two networks’ parameters eventually converge? I don’t think this was discussed in either lectures or labs.