I chose “Spend a few days check what is human-level performance for these tasks so that you can get an accurate estimate of Bayes error”.

I am wondering why this cannot be the answer. I thought the first step is “to set up dev/test set and metric” and checking the human-level performance belongs to “setting up metric”.

What is the answer option “setting up metric” exactly?

Yes.

Not necessarily the **first** thing we should investigate.

We first need to think about why we need an accurate estimate of Bayes error. As Prof Andrew Ng discussed on the slide below, we use it as a guide for whether we should focus on reducing bias or variance to cramp out the last performance out of our model.

Since it is an image recognition task, we already know we can expect good human-level performance. It would be better to spend our time iterating on the problem and learning from the data we have at hand.

Setting up optimizing metric etc belongs to “Spend a few days training a basic model and see what mistakes it makes.”

In closing, an accurate estimate of Bayes error is only of interest when you already are doing very well on your problem, as Prof Andrew Ng is talking about here:

I’d encourage you to build something quick and dirty. Use that to do bias/variance analysis, use that to do error analysis and use the results of those analysis to help you prioritize where to go next.