Now In the deep learning era, When implementing regularisation for the problem of high variance, will bias be effected?
and when increasing the complexity of the Neural Network for the problem of high bias, does it lead to overfitting?
OR Bias and variance are independent of each other i.e reducing bias will not increase variance and reducing variance will not increase bias.
Please explain?
This is an interesting question, but the answer will depend in your current model structure and the changes you apply to it.
For example, if you have a High Bias problem, you may increase the size of your NN, but this change may be relatively small and not enough to increase the network overfitting significantly.
In the same way, applying techniques to reduce Variance, like regularization, dropout, etc. might not increase the Bias significantly if the model was not overfitting.
In conclusion, you’ll have to test your model Bias and Variance with each change to really see how it is affected. To that end, you can follow Andrew’s recipe: Basic Recipe for Machine Learning
First of all, thanks for your explanation @javier.
Increasing the width means adding more hidden layers?
and
“In the same way, applying techniques to reduce Variance, like regularization, dropout, etc. might not increase the Bias significantly if the model was not overfitting.”
from your explanation,
we only apply regularisation or dropout only when the model is overfitting for the purpose of reducing it. But, In the last line you said bias will not increase if the model was not overfitting
Please explain?
As @javier pointed out exact actions would higly depend on the model we are working on. The process is repetitive meaning that:
We make a decision on what actions to apply to reduce bias or variance in the model.
We estimate errors on a train set and a dev set (it sometimes called a validation set).
If we are not satisfied with results, we make a decision on what actions to apply again.
To understand what problem we are facing, we estimate an error on a train set and an error a dev set and then compare these errors:
If training set error is high, we have a high bias problem.
If training set error is low, but dev set error is high, we have a high variance problem.
If both of the errors are high, we have high bias and high variance.
The rule of thumb to achieve a low error on the train set first. It also means that we address a high bias problem first. To do that we can:
Increase model capasity (e.g. increase number of hidden units and/or layers).
Increase mini-batch size.
Use additional features.
Just train our model for longer time.
If only we have got a low error on the training set, we start working on reducing the error on the dev set. It also means that we start addressing a high variance problem. For that purpose we can:
Hi @ajaykumar3456, there is not much that I can add to @manifest great explanation.
But just to clarify, the point I tried to make is that applying a technique to reduce Bias not always will increase your Variance significantly, and applying a technique to reduce Variance not always will increase Bias significantly.
So you should always test Bias and Variance before and after making changes to the model.
Hi @manifest
In the course I think we only learned about how the minibatch size increase training speed. Could you please explain how the mini-batch size affect bias as well?
Sure. Small batches provide some regularizing effect, perhaps due to the noise they add to the learning process. If we have high bias we may want to decrease the regularizing effect.
Hi @manifest
I thought small batches due to the noise prevent from converging near the destination (in case we have not decayed alpha) and so is similar to early stopping. If I right you can decrease such regularizing effect not only by increasing batch size but also by alpha decay. Am I right?