# How Bayes error explains lower error rate based on dev/test sets?

A case was made in week 2 that a higher Bayes’ error could account for lower error rates on the dev/test set compared to the train-dev set. I don’t see how a higher Bayes’ error (i.e., a fundamental limit on the accuracy of classification) could improve the dev/test results.

Can you point to the lecture where this is stated? The name of the video and the time offset, please.

Without actually hearing what Prof Ng specifically said, my guess is that you are just misinterpreting what he said. The Bayes Error is the Bayes Error, right? You can’t change that, although you actually have no way to really measure it either for that matter. It is a question of interpreting what the training and dev/test errors mean in light of having a relatively high Bayes Error. E.g. maybe it means that the relatively high dev/test errors are to be expected because you just can’t do any better because of the high Bayes Error.

But once I can listen to the section you are asking about, there’s probably a better explanation than what I said above …

The way I understood it, I think Prof Ng was trying to explain an unexpected result i.e. dev/test set errors are lower than training/training-dev set errors. And this can happen if the training data is from a different distribution, which is harder to predict correctly compared to the target data distribution (dev/test set distribution).

For example, training data could be blurry/low res/occluded vs the dev/test set, which might have fewer or none of these issues. In this case, the Bayes error on the training set is higher than the Bayes error for the dev test.

Although, I do agree that I didn’t think Bayes error could change based on the distribution of data … but looks like it does.

Nidhi

I have the same question. It is from the feedback of one quiz question. In this question, Dev and Test errors(1.3% and 1.1%) are lower than the Training and Training-Dev errors (2% and 2.3%). The human-level error is around 0.5%. The question asks whether the Bayes error for dev/test distribution is higher than for the training distribution, I selected false and got it incorrect. The feedback says that Bayes error for the data distribution of the Training-Dev set is higher. I don’t understand why my answer is wrong first. Secondly, the feedback seems to support my selection.

Thanks!

The statement “bayes error is higher is very confusing”.

Given in the Question,
Training 2% error
Traindev 2.3% error
dev set 1.3% error
test set 1.1% error.

“You also know that human-level error on the road sign and traffic signals classification task is around 0.5%. Based on the information given you conclude that the Bayes error for the dev/test distribution is higher than for the train distribution. True/False?”

@paulinpaloalto mentioned that
" It is a question of interpreting what the training and dev/test errors mean in light of having a relatively high Bayes Error. E.g. maybe it means that the relatively high dev/test errors are to be expected because you just can’t do any better because of the high Bayes Error."

Does it mean that higher bayes error could expect higher error rate?
If so,
training set has 2% error and
dev set has 1.3% error,
so by the sake of this argument, the bayes error in training set shall be expected HIGHER.

Yet, the question state
// conclude that the Bayes error for the dev/test distribution is higher than for the train distribution. True/False?

Yet, the quiz state that the correct answer is TRUE.

Hope mentor could provide some insight.

Many thanks.

Sorry, my statement that you quote was only based on the original post on this thread, which did not give as clear a statement of what the quiz question says. Julie’s later post made that a lot more clear and based on that, I think what I said earlier is not relevant. Sorry.

Notice that it says that the human error on this task is 0.5%. That means (by definition) that the Bayes Error is <= 0.5% because the Bayes Error is the best that can theoretically be done, right? Notice also that they don’t say “Human Error on the training set or the test set”. They just say “the Human Error on this task”. I make no claim to really have the complete answer on any of these issues, but it was always my understanding that the terms Human Error and Bayes Error applied to the task as a whole, not to a particular set of data.

Ok, with that said, let’s get to the actual quiz question and answer.

Ok, I would say you are right that the answer there should be False. For one of two reasons:

1. If you buy my argument that the Bayes Error doesn’t apply to any particular dataset, then it would be False because the Bayes Error can’t be higher on the dev/test set.

or

1. If you don’t buy my argument in 1), then it should still be False because if anything that would imply the dev/test set is from an easier distribution than the training set, meaning that the Human Error and Bayes Error on the dev/test data would be lower, not higher.

So I agree with your overall point that this seems like a bug in the quiz answer. I will ask the course staff to have a look and let you know if I hear anything back …

1 Like

@paulinpaloalto My heartfelt thanks for your excellent explanation, especially I am absolutely agree with that bayes error does NOT apply to particular dataset. The bayes error by definition is the best possible error that we could achieve given on a particular task, e.g. identify a cat. Thus, the bayes error is applicable for all set of data, no matter it is training/dev/test set, it should be the SAME since all data set are frame under the same optimization problem. By that said, it’s meaningless to compares bayes error in training set/dev set/test set. Particularly, if we use human performance as an estimate of bayes error, shouldn’t it be consistent on error % no matter how we randomly split/shuffled the data.

If we found there are difference in error rate among training/dev/test, we should do error analysis to figure out the possible sources of error. If we can’t find it, bayes error cannot gives any clues on it, rather it’s due to differences in dataset that we, as a human, cannot spot it out.

Million of thanks indeed to provide this useful insight. It really deepen my understanding.