Avoidable Bias vs Variance compromise

In DLS/C3/W1 Avoidable Bias refers to the difference between Train set error and Bayes error. However, the difference between Dev set error and Train set error is called variance. In previous videos variance increased when you over-fitted which Andrew describes as happening only if your Train set error is smaller than Bayes error. My doubts are:

  1. If Train set error and Bayes error are close, but Dev set error is far from Train set error. Am I not already over-fitting? The only solution that I see to this is having a bit more error in Train set to decrease error in Dev set. How can Train set error and Bayes error remain close (small avoidable bias) and decrease Dev set error (variance) only?
  2. I’m familiar with techniques to decrease variance such as regularization, increasing the test set size, early stopping, changing network topology, etc. Don’t all of this affect (increase) avoidable bias too?

For instance my understanding of regularization is that you force the coefficients of high order to have a very small (negligible) value by forcing the system to minimize the cost function + added regularization weights. In this way your model will fit less well to train set data (you have less high order polynomials terms to fit complicated shapes) and you will avoid over-fitting, i.e. you will generalize better on unseen data. By doing this, aren’t you getting further away from Bayes error level? In the end if we assume the humans are at Bayes level, humans could identify very well how to fit a line that separates wanted from unwanted data no matter how “high order polynomials” our brains need to use

Hi @vmmf89 ,
first of all, welcome to this community!
About your questions:

  1. Yes, you are over-fitting if train and dev errors are not near. To avoid over-fitting, you can do regularization, for example. This will however, most likely, increase your train error, so again making train error away from Bayes error
  2. Yes, they do, by definition.
    Your explanations make total sense to me, well done and keep making progress :slight_smile:

Hola Carlos,

Thank you for your answer. In your opinion why is the difference between Train and Dev set called variance? Isn’t high variance something that indicates you over-fitted your training set and now your network doesn’t generalize well in the Dev set ? If your train set is at Bayes error level (you have not over-fitted yet), then if you have a higher error in your Dev set. Why could this happen?

One possible example I could think of (a bit too elaborated in my opinion) was put by Andrew when he was explaining about Co-variate shift in the video “Why does Batch Norm works”. Lets say that my Network is able to achieve Bayes level with black cats (i.e. it is as good as a human identifying black cats) in training set, but for cats of other colors it doesn’t generalize well and it happens that all the cats of other colors were on the dev set (that’s why I always shuffle data before, to avoid this)