Tailored Cost Funtion

Suppose that, for a particular problem, I studied carefully and came to the terms for the best optimizing metric (a real number). Suppose also that this optimizing metric is very specific to my problem at hand and this metric is not one of the conventional metrics, like accuracy, precision, recall, etc. It is a tailored metric that is best suited for the problem at hand.

Given all the premises above, my question is: 1- Is the best practice in the deep learning industry keeping the cost function for the classificator training as a regular function, like cross entropy, and use the tailored metric only as a means for ranking different classifiers, by computing this metric on the development set?

2- Or is it worth directly implementing the optimizing metric as the cost function? (In this second case, after training, the task of checking the classifiers against the dev set continues to be the same as in the case 1 above.

In other words, my question is basically about the definition of the cost function. Should I bother implementing a tailored cost function, to be close to the optimizing metric, or should I use the optimizing metric solely to check and rank the different classifiers using the dev set?

I think the high level point is that you need a cost function of some sort that accurately reflects the performance of your model, because that is what drives back propagation to train the model. Whatever you define that function to be, you need to be able to compute the gradients of that function w.r.t. each of the parameters of your model that you need to train. You can call it an “optimizing metric” or a “cost function”, but it needs to be a scalar valued function that can be used to drive back propagation using its gradients.