Soft update in Deep Q network


In the lecture, one of the refinements for DQN was suggested to be a soft update of Q. And professor also mentioned that this can be applied to supervised learning algorithms.

My question is, as we are using the neural network for learning and we have the cost function such as MSE to guide the learning process. Why can’t we use cross-validation for model evaluation and do the update as usual (Q = Qnew) instead of taking the soft update?

How does the soft update helps also in supervised learning? For suppose if we consider supervised learning, params w, and b are updated and the updated model is evaluated against the cross-validation set. If the cost is increasing for any model, then we stop the training and stop updating the parameters. How does soft updating work in this scenario?

When there is a cross-validation set, we will already get to know the model performance.

So, if there is a cross-validation set, then the soft update is still necessary?

If yes, why is the need to take a small percentage of the model when we know that model is performing badly on a cross-validation set?


Hello @bhavanamalla,

The key missing piece here in your consideration is that we don’t necessary have a full, representative set of data in a reinforcement learning setting.

In a reinforcement learning setting, we are facing a very large state and action space that we assume ourselves never going to have a full dataset for each of those state-action pair. Instead, due to the limitation of memory, we only remember a very small subspace of state-action. Consequently, each training to the model is done over an unrepresentative samples (that represents only a small subspace of state-action). In such case, if we train the model without soft-update, the model will soon shift itself completely to the small subspace described by the current training dataset AND will “forget” about the previous training dataset that represent a different subspace.

Performing soft-update helps slow down the shifting, dragging the model back from learning so well the current training dataset in the hope that it will bias more on the previous experience.

In contrast, in a usual supervised learning setting, we have a full dataset that is representative to the whole space of the problem, and then we randomly sample batches from the full dataset to perform training. The model is assumed to be always trained to learn on more or less the same distribution of data in the space of the problem. In a reinforcement learning setting, however, to repeat, we assume ourselves to not have that full dataset from the beginning.


Thanks, Raymond!

So, according to this, in a supervised setting, the soft update is not necessarily needed as we already have a representative set of data and we could train on the entire dataset.

I know this is a silly question, but I am asking it just to make myself clear on this. When considering mini-batches in a supervised setting, will it be similar to the subspace scenario of reinforcement learning? And since the model gonna train on a small set of training data in every iteration and update the params and then take another mini-batch and continues the same process, is soft update a good choice while updating the params in the mini-batch setting?

Hello @bhavanamalla,

I will not compare mini-batching with the situation in reinforcement learning, because in mini-batching, sampling is random with respect to the whole training set which is supposed to be representative, and the training goes over the whole train dataset again and again. These are conditions we don’t see in reinforcement learning. I would not just look at one mini-batch and think that this is a similar case while forgetting the above factors.

Moreover, I usually don’t suggest when to do what and when not to do what, because they are usually open to try and be challenged. Soft update in supervised learning is not difficult to try.

Also, you only argued on one side of the problem - supporting soft update in supervised learning; perhaps in the future, when this comes to you again, you can try to argue against it. There could be some real convincing arguments against that too, but I will leave that to you to think.