Both visually (Calculator Suite - GeoGebra) and algebraically (picture) it seems to me that doing a Soft Update is the same as decreasing alpha.
Is that correct?
Thanks
Both visually (Calculator Suite - GeoGebra) and algebraically (picture) it seems to me that doing a Soft Update is the same as decreasing alpha.
Is that correct?
Thanks
Hi!,
I would need a more precise definition of soft update. But, as shown in the image, both expressions are equivalent, being the alpha variation range different from c, governed by the relation: alpha = 1 - c.
Cheers,
Hi!,
Thank you for your answer.
Andrew defines Soft Update on his Week 3 lecture about improvements to the reinforcement learning gradient descent algorithm titled āAlgorithm refinement: Mini-batch and soft updates (optional)ā.
He basically says that, when doing a Soft Update, we set the neural network parameters to be equal to a sum of the old values and the new ones suggested by a gradient descent iteration.
So we update our parameters only partially, by conserving a percentage of the old values.
As he explains on the practice lab later, the objective is to improve the stability of our Q neural network by changing the āyā values just a little each time.
I had the felling this process was similar to decreasing alpha and it seems to me they have the same effect.
But because Andrew introduced it as a new concept, I posted the question here to check if Iām right or if Iām missing something.
Hello @dncjorge!
Interesting thought! I think we need to replace step with alfa \times step in the first line of your 2. Agree?
I think your derivation is only correct in the following situation:
randomly initialize the Q-Network (QN) and set the target-Q-networkās (TQN) parameter to be the same as the QNās.
train the QN once
soft update the TQN with the QN
However, after the first soft update, the QN and the TQN will become different, and in that case we canāt say the two x in the first line of your 2 are the same x. The QN and the TQN are ofcourse correlated and we can keep track of their correlation through the c parameter and all the different step, but the relationship might look complex⦠(i didnāt do it myself)
Cheers,
Raymond
Hi @rmwkwok!
Thank you for the correction in equation 2.
Iām now reading the practice section and I understand what you said.
QN is updated like this:
1- (new) QN = (Old) QN + alpha * GDUpdate(QN)
where GDUpdate is the gradient descent updates (step) calculated using QN weights
TQN is updated like this:
2- TQN = c * (Old) TQN + (1-c) * (new) QN,
substituting [(new) QN] using 1, we have:
2- TQN = c * (Old) TQN + (1-c) *[ (Old) QN + alpha * GDUpdate(QN)]
but as [c * (Old) TQN] wonāt cancel with [-c * (Old) QN] (like I thought before), equation 2 is different from a simple update to TQN using the gradient descent update calculated from QN like I thought before:
3- TQN = (Old) TQN + alpha * GDUpdate(QN)
We canāt arrive at eq. 3 from eq. 2 exactly for the reason you said: (Old) TQN will be different from (Old) QN at the time of the update.
I believe I got some new insight for me from all this discussion, it is:
Doing a conventional iteration of gradient descent in a neural network is the same as updating it to be a weighted average of itself and its new updated version (the update being calculated with its own weights in gradient descent).
Who controls the balance of this weighted average is the value of alpha if we write it traditionally in āstepā notation, or the percentages defined by ācā if we write it as
final_new = c*old + (1-c) * new(updated with some alpha).
But on the case of QN and TQN like we are discussing, a step of gradient descent with Soft Update will make TQN be a weighted average of itself not with its updated version, but with QNās new updated version (and the update being calculated by gradient descent using QN weights, and not TQN own weights).
do you agree?
thank you for the help!
Douglas
Hello Douglas again,
Hmmm, you are speaking it as if I already know what the updated value will be before I have actually updated it. Letās put aside whether you can mathematically make this equation, does this really make sense? (Sorry I am a bit straightforward here, but I hope this will deliver my doubt clearly for the sake of an effective discussion)
I canāt prove this.
Agreed.
Cheers,
Raymond
Thanks @rmwkwok , I realized the definition was ambiguous but Iāve let it for the general idea.
Thereās a right way to express it but itās not worth writing because my initial doubts are already answered.
Thanks for your help.
Douglas
You are welcome Douglas!
Raymond