Large Neural Networks and bias/variance

On week 3, in the video Bias/variance and neural networks, it is said these 2 things:

  • Usually larger neural networks with small/moderate size data usually are low bias machines
  • A large neural network without appropriate regularization tends to be overfit

Could anyone help me understand the reason behind these 2 statements?
thanks! :slight_smile:

Hello @juliandniz

I can only find your first quote in that video but not the second one in its exact wording (perhaps you only quoted the idea?). For the first one, let me quote a bit more:

And it turns out that large neural networks when trained on small term moderate sized datasets are low bias machines. And what I mean by that is, if you make your neural network large enough, you can almost always fit your training set well. So long as your training set is not enormous.

As it explained, it means that if your NN is large enough, you can almost always fit your training set well. Moreover, if it turns out to be only fitting the training set well but not the validation set, then this means it overfits (which is what your second quote is about).

Starting from 2:23, the lecture suggested a way that will iterate between reducing bias and reducing variance. We enlarge the NN to reduce bias (which might or might not increase variance), and in case it increases variance, then we add data to counter that. However, if we cannot add more data, then regularization is our way out.

Therefore, after we enlarge the NN to reduce bias and fit well on the training set, in case the validation set performs poorly, we do not have new data to add AND we do not properly regularize, then it stays at high variance (overfit).

Anything unclear?

Cheers,
Raymond