Bias / Variance

jpedroanascimento · May 21, 2021, 10:08pm

Hello everyone,

So when speaking about Bias / Variance, Andrew mentions that the training set error as well as a test set error are compared to a base error, so a 5% training set error might be high if the base error is ~0%, but low if the base error is ~8%. Regarding simple image classification (cat / not cat), I understand that the base error is ~0% as it is assumed that human beings are able to classify images pretty well. But is there an objective way to clearly define the base error for a different activity? Let’s say I’m trying to look at chest x-ray images to classify for Covid positive / negative; or that I’m trying to predict if a user will click on an add. It’s not easy to think how would humans behave in trying to predicting these, and for these two examples, I would say the base error is higher than 0%; but is it 5%? 10%? How could we think about the base error in these type of situations?

A second question regarding the bias error. Andrew mentions that adding more training data will not decrease the training error, but it will decrease the test error. But in practice, why can’t we reduce the training error by adding more training data? I’m thinking of a model that’s trying to predict Y by looking at, let’s say, 10 training examples, and it has a certain training error. If I then increase my training set to 100k examples, and still use the exact same model, it makes sense to me that the model is now able to draw more consistent patterns, now visible with 100k examples, but that were invisible with such a low training set of just 10 examples. In this extreme scenario, shouldn’t the training error decrease by adding more training data?

Thanks.

TMosh · May 9, 2023, 7:11pm

(moderator note: Just adding a reply so this topic isn’t unanswered. It was hiding out in the “General Discussions” forum area, which is pretty much not monitored)

No, there isn’t. Often it requires some amount of domain knowledge on the specific topic.

Most likely adding more data will increase the variance, so it will promote underfitting. This
increases the training error.

Topic		Replies	Views
Week 1 - Question 10. Debate Structuring Machine Learning Projects coursera-platform	14	1687	November 18, 2022
Understanding Human Level Performance Lecture V Structuring Machine Learning Projects coursera-platform	3	675	August 29, 2021
How Bayes error explains lower error rate based on dev/test sets? Structuring Machine Learning Projects coursera-platform	6	806	July 4, 2022
Bias Reduction Structuring Machine Learning Projects coursera-platform	1	422	July 20, 2023
Course 3, Week 2, Assignment, Bayes error Structuring Machine Learning Projects coursera-platform	14	747	October 27, 2022

Bias / Variance

Related topics