In week 2 it’s written in the slides
Baseline gives an estimate of the irreducible error / Bayes error and indicates what might be possible.
It’s also written that there are multiple way to establish baseline
1- Human level performance (HLP)
2- Literature search for state-of-the-art/open source
3- Older system
but I think that HLP is the only baseline that gives an estimate of the irreducible error / Bayes error and indicates what might be possible.
I think that there is a difference between HLP and other baselines like older system.
we try to make our model as good as HLP. on the other hand, we try to improve upon our older system.
Also, HLP requires another cost and time to collect but using open source baseline is easier.
Also HLP is the only baseline that can be used to produce the image below. older systems would give worse accuracy on average.
I think in this course, we focused only on HLP but not on the other weak cheap baselines
As argued by Prof. Ng in the video ‘Establish a baseline’, HLP is particularly useful with unstructured data. When time or budget does not allow this, other, less optimal, solutions are available as a starting point - such as literature search for state-of-the-art, a quick and dirty implementation, or the performance of older systems. As Prof. Ng indicates, the use of HLP on unstructured data gives an indication of Bayes error.
For structured data this may be different. Here, ways of cross-checking may be used that are not performed directly by humans but rather by different sets of queries or runs of different applications that should arrive at the same end result. Then, the results attained may give a better indication of Bayes error than HLP.
So, HLP is the only indication of bayes error(irreducible error)?
if yes then the others(Literature search, older system) are not baselines, right?
‘A’ baseline is not equivalent to Bayes error, it may also be some other measure just as long as it gives you something to aim at. Moreover, as professor Ng explains, Bayes error may be a good baseline for unstructured data, but not for structured data.