Basic Recipe for ML - Week 1 - Train larger/More data?

I am a bit confused about Train larger for high bias and more data for high variance.

By train larger we mean that we should split the data so we have even more training examples? And by more data we mean increase our training examples all together?

So if I have 2000 training examples with a 60/20/20 split and I see a high bias problem, one of the solutions is to increase the train set? Like 70/15/15?

And if I have high variance, by saying “more data” we mean get like 2500 training examples?

High bias is underfittng, high variance is overfitting.

When it says train larger it means a more complex model architecture so model fits the training data better.

When it says more data it means because of overfitting you need more data to present scenarios that are different from those that the training set has in it which describe a more inclusive distribution of the data.

you need more data to present scenarios that are different from those that the training set has

So increase the split ration or increase the whole dataset in general?

Either but the important this is that you need data which can better represent the distribution of information.

Thank you for the info!

1 Like

High bias means the overall accuracy for both training and test sets is low.

The line says train longer, not train larger.

Oh dmn, you’re right. This makes more sense haha