Soft update is the same as Adjusting Alpha?

dncjorge · October 18, 2022, 1:51pm

Both visually (Calculator Suite - GeoGebra) and algebraically (picture) it seems to me that doing a Soft Update is the same as decreasing alpha.

Is that correct?

Thanks

Enzo_Faliveni_Alzuet · October 18, 2022, 2:06pm

Hi!,

I would need a more precise definition of soft update. But, as shown in the image, both expressions are equivalent, being the alpha variation range different from c, governed by the relation: alpha = 1 - c.

Cheers,

dncjorge · October 18, 2022, 2:35pm

Hi!,
Thank you for your answer.
Andrew defines Soft Update on his Week 3 lecture about improvements to the reinforcement learning gradient descent algorithm titled “Algorithm refinement: Mini-batch and soft updates (optional)”.
He basically says that, when doing a Soft Update, we set the neural network parameters to be equal to a sum of the old values and the new ones suggested by a gradient descent iteration.
So we update our parameters only partially, by conserving a percentage of the old values.
As he explains on the practice lab later, the objective is to improve the stability of our Q neural network by changing the ‘y’ values just a little each time.
I had the felling this process was similar to decreasing alpha and it seems to me they have the same effect.
But because Andrew introduced it as a new concept, I posted the question here to check if I’m right or if I’m missing something.

rmwkwok · October 18, 2022, 3:12pm

Hello @dncjorge!

Interesting thought! I think we need to replace step with alfa \times step in the first line of your 2. Agree?

I think your derivation is only correct in the following situation:

randomly initialize the Q-Network (QN) and set the target-Q-network’s (TQN) parameter to be the same as the QN’s.
train the QN once
soft update the TQN with the QN

However, after the first soft update, the QN and the TQN will become different, and in that case we can’t say the two x in the first line of your 2 are the same x. The QN and the TQN are ofcourse correlated and we can keep track of their correlation through the c parameter and all the different step, but the relationship might look complex… (i didn’t do it myself)

Cheers,
Raymond

dncjorge · October 18, 2022, 5:32pm

Hi @rmwkwok!
Thank you for the correction in equation 2.
I’m now reading the practice section and I understand what you said.

QN is updated like this:
1- (new) QN = (Old) QN + alpha * GDUpdate(QN)
where GDUpdate is the gradient descent updates (step) calculated using QN weights

TQN is updated like this:
2- TQN = c * (Old) TQN + (1-c) * (new) QN,

substituting [(new) QN] using 1, we have:

2- TQN = c * (Old) TQN + (1-c) *[ (Old) QN + alpha * GDUpdate(QN)]

but as [c * (Old) TQN] won’t cancel with [-c * (Old) QN] (like I thought before), equation 2 is different from a simple update to TQN using the gradient descent update calculated from QN like I thought before:
3- TQN = (Old) TQN + alpha * GDUpdate(QN)

We can’t arrive at eq. 3 from eq. 2 exactly for the reason you said: (Old) TQN will be different from (Old) QN at the time of the update.

I believe I got some new insight for me from all this discussion, it is:

Doing a conventional iteration of gradient descent in a neural network is the same as updating it to be a weighted average of itself and its new updated version (the update being calculated with its own weights in gradient descent).
Who controls the balance of this weighted average is the value of alpha if we write it traditionally in ‘step’ notation, or the percentages defined by ‘c’ if we write it as
final_new = c*old + (1-c) * new(updated with some alpha).

But on the case of QN and TQN like we are discussing, a step of gradient descent with Soft Update will make TQN be a weighted average of itself not with its updated version, but with QN’s new updated version (and the update being calculated by gradient descent using QN weights, and not TQN own weights).

do you agree?

thank you for the help!

Douglas

rmwkwok · October 19, 2022, 3:55am

Hello Douglas again,

Hmmm, you are speaking it as if I already know what the updated value will be before I have actually updated it. Let’s put aside whether you can mathematically make this equation, does this really make sense? (Sorry I am a bit straightforward here, but I hope this will deliver my doubt clearly for the sake of an effective discussion)

I can’t prove this.

Agreed.

Cheers,
Raymond

dncjorge · October 19, 2022, 5:13am

Thanks @rmwkwok , I realized the definition was ambiguous but I’ve let it for the general idea.
There’s a right way to express it but it’s not worth writing because my initial doubts are already answered.
Thanks for your help.

Douglas

rmwkwok · October 19, 2022, 7:45am

You are welcome Douglas!

Raymond

Topic		Replies	Views
Not so clear the concreate difference between soft update and normal update Unsupervised Learning, Recommenders, Reinforcement week-3	19	462	July 3, 2023
C3W3 assignment: what is soft update? Unsupervised Learning, Recommenders, Reinforcement week-3	1	544	August 30, 2022
Is soft updates approach used only with mini-batch / stochastic GD? Unsupervised Learning, Recommenders, Reinforcement week-3	3	485	February 7, 2023
Algorithm refinement: Mini-batch and soft updates Unsupervised Learning, Recommenders, Reinforcement week-2	1	467	January 26, 2023
Soft update in Deep Q network Unsupervised Learning, Recommenders, Reinforcement week-3	3	797	August 15, 2023

Soft update is the same as Adjusting Alpha?

Related topics