Establishing the baseline performance

Hello everyone,

I have a question in regard to the lecture about establishing the baseline and I hope someone can help me with that.

The whole premise of establishing the truth baseline (for example compared with humans) is to set a truth point and then judge whether the J_train or J_CV is large or not in comparison to the baseline. In another word, giving a more meaningful understanding of the numbers that we are getting. So when we agree that human can detect voices with a 10.2% error and our model predict a 10.4% error then we can say that our model is good versus the scenario that it was a 15% error. However, the problem that I have with this is that we just shifted the problem from absolute value to relative value and have not responded to the meaning of the number.

Here the problem rises again as to whether a 0.2% error is reasonable. Let me give you an example so that it is more clear. Imagine that the ground truth is 10.2% and all other reasonable models predict a range of 10.3 ~ 10.5% error. Then is 0.2 reasonable? or the opposite scenario, what if the J error is super sensitive and with the smallest changes in the weights it jumps up to 40% percent then is a 15% error so unreasonable? What I am trying to say is now the problem is not whether the absolute number is meaningful it is the difference.


Hello Navid,

I can see 2 angles here.

First, the 10.2% human error should have come with an error margin because not every single person achieves the exact same error, for example, a person might have 10.5% and another (healthier or in a better condition) person 9.9%. In this sense, a 0.2% difference may not be significant at all comparing with the wide spread of the human’s achievable errors.

Second, our motivations:

  1. in terms of our business goal, is the 0.2% difference unacceptable? If we were planning to replace human with model for a production line’s QA tasks, then I would try to estimate what those 0.2% would represent in terms of the final extra cost generated, and ask myself whether such cost is tolerable, compared to the cost I would have saved by the model.

  2. the 10.2% is really just a baseline target we can rely on and we probably would want to go beyond that after some time, then in this case the 0.2% isn’t that important because we would have a higher target driven by our business need.



Btw, Navid @Navid_Zolfaghari_Moh, how did you add your LinkedIn next to the title?

1 Like