Soft update is the same as Adjusting Alpha?

Both visually (Calculator Suite - GeoGebra) and algebraically (picture) it seems to me that doing a Soft Update is the same as decreasing alpha.

Is that correct?

Thanks

Hi!,

I would need a more precise definition of soft update. But, as shown in the image, both expressions are equivalent, being the alpha variation range different from c, governed by the relation: alpha = 1 - c.

Cheers,

Hi!,
Thank you for your answer.
Andrew defines Soft Update on his Week 3 lecture about improvements to the reinforcement learning gradient descent algorithm titled “Algorithm refinement: Mini-batch and soft updates (optional)”.
He basically says that, when doing a Soft Update, we set the neural network parameters to be equal to a sum of the old values and the new ones suggested by a gradient descent iteration.
So we update our parameters only partially, by conserving a percentage of the old values.
As he explains on the practice lab later, the objective is to improve the stability of our Q neural network by changing the ‘y’ values just a little each time.
I had the felling this process was similar to decreasing alpha and it seems to me they have the same effect.
But because Andrew introduced it as a new concept, I posted the question here to check if I’m right or if I’m missing something.

Hello @dncjorge!

Interesting thought! I think we need to replace step with alfa \times step in the first line of your 2. Agree?

I think your derivation is only correct in the following situation:

  1. randomly initialize the Q-Network (QN) and set the target-Q-network’s (TQN) parameter to be the same as the QN’s.

  2. train the QN once

  3. soft update the TQN with the QN

However, after the first soft update, the QN and the TQN will become different, and in that case we can’t say the two x in the first line of your 2 are the same x. The QN and the TQN are ofcourse correlated and we can keep track of their correlation through the c parameter and all the different step, but the relationship might look complex… (i didn’t do it myself)

Cheers,
Raymond

Hi @rmwkwok!
Thank you for the correction in equation 2.
I’m now reading the practice section and I understand what you said.

QN is updated like this:
1- (new) QN = (Old) QN + alpha * GDUpdate(QN)
where GDUpdate is the gradient descent updates (step) calculated using QN weights

TQN is updated like this:
2- TQN = c * (Old) TQN + (1-c) * (new) QN,

substituting [(new) QN] using 1, we have:

2- TQN = c * (Old) TQN + (1-c) *[ (Old) QN + alpha * GDUpdate(QN)]

but as [c * (Old) TQN] won’t cancel with [-c * (Old) QN] (like I thought before), equation 2 is different from a simple update to TQN using the gradient descent update calculated from QN like I thought before:
3- TQN = (Old) TQN + alpha * GDUpdate(QN)

We can’t arrive at eq. 3 from eq. 2 exactly for the reason you said: (Old) TQN will be different from (Old) QN at the time of the update.

I believe I got some new insight for me from all this discussion, it is:

Doing a conventional iteration of gradient descent in a neural network is the same as updating it to be a weighted average of itself and its new updated version (the update being calculated with its own weights in gradient descent).
Who controls the balance of this weighted average is the value of alpha if we write it traditionally in ‘step’ notation, or the percentages defined by ‘c’ if we write it as
final_new = c*old + (1-c) * new(updated with some alpha).

But on the case of QN and TQN like we are discussing, a step of gradient descent with Soft Update will make TQN be a weighted average of itself not with its updated version, but with QN’s new updated version (and the update being calculated by gradient descent using QN weights, and not TQN own weights).

do you agree?

thank you for the help!

Douglas

Hello Douglas again,

Hmmm, you are speaking it as if I already know what the updated value will be before I have actually updated it. Let’s put aside whether you can mathematically make this equation, does this really make sense? (Sorry I am a bit straightforward here, but I hope this will deliver my doubt clearly for the sake of an effective discussion)

I can’t prove this.

Agreed.

Cheers,
Raymond

Thanks @rmwkwok , I realized the definition was ambiguous but I’ve let it for the general idea.
There’s a right way to express it but it’s not worth writing because my initial doubts are already answered.
Thanks for your help.

Douglas

1 Like

You are welcome Douglas!

Raymond