Is soft updates approach used only with mini-batch / stochastic GD?

Soft updates approach allows us to update our NN parameters more “conservatively”.

  1. Am I right that this concept makes no sense with batch gradient descent, where all training set is used at the each GD step? Because in this case our new parameters are definitely better than the previous ones (at least according to all the data we have)?
  2. If answer to p.1 is “yes”, then am I right that we can use it also for supervised learning?

Hello @m.kemarskyi.ip82,

Let me ask a clarify question: are you asking about using soft-update with batch gradient descent in a Reinforcement Learning Neural Network? My focus is whether it is a Reinforcement Learning Neural Network.


@rmwkwok Yes. But I guess there should be no difference between supervised & reinforcement. Am I wrong?

Hello @m.kemarskyi.ip82,

The clarification is helpful. Thanks. We can organize your questions in this table

mini-batch GD batch GD
RL soft update A B
SL soft update C D

First, some background

This lecture video compares the batch and the mini-batch GD. Basically, we can learn faster with mini-batch GD but we will also suffer from more “oscillations” due to the variations among the mini-batches.

Second, comparing A with B:

  • For A, it is what the week 3 assignment 3 is all about.
  • For B, it is actually possible for you to adjust the BATCH_SIZE in the assignment to see the difference yourself. You only need to open the “” and then change the MINIBATCH_SIZE value from 64 to a larger number. But, I only recommend you to do it after you have passed the assignment, so that the experimentation won’t mess up with your grade.

You don’t have to do it, but if you do the experiments and share your findings with us about the difference between A and B, then we can further discuss.

Third, comparing C with D:

There is no lab that uses soft update in a SL neural network. However, by intuition, we know what the soft update does which, from one perspective, is to slow down the update by dragging back from the new learnt states towards the old states. As we know, mini-batch learns faster, so the dragging effect should be more. If I compare the learning speeds of the following four:

mini-batch GD batch GD
SL soft update C D
no SL soft update E F

I would guess E > C > D > F. If we want to learn fast, then obviously we go for E. However, if we want less draw-back (the oscillation thing), then we can choose between C, D and F.

Having said that we can choose between them, does it mean soft update is helpful in SL? I would say no, because you don’t need soft update to slow down the learning speed. Instead, you only need a larger mini-batch size to slow it down.

Therefore, I think the mini-batch size is sufficient for us to control the speed and we don’t need soft-update in SL for that purpose.

Knowing that soft-update is not needed in SL, following the same logic, do I really need soft-update in RL?

My answer is we need it. There is one critical difference between the RL and the SL. In SL, we have the full dataset in hand, but in RL, we have to explore it along the way. If we say the SL’s dataset is already about the full picture of the world, then the RL’s dataset is always only about a small part of the world: it is the part that the agent is exploring.

After RL learns one small part (of the world), and then when it learns the second small part, if we don’t use soft-update, the model can shift itself from the old small part to the new small part. Too much shifting or complete shifting is NOT desired, so soft-update here does not just slow the learning down, but effectively, it drags the model from the new small part of the world back towards the old small part of the world, so that it kind of maintains a balance with the hope that the model can generalize better, instead of only generalizing to the new small part of the world better.

In summary:

  1. I will let you decide whether you want to experiment the effect of Mini-batch and batch GD in RL
  2. Soft-update is not needed in SL for the purpose of slowing down the learning speed
  3. Soft-update is needed as an extra handle in RL for the sake of generalization, due to the extra source of problem from an incomplete dataset