Cross validation?

You provided some information on cross-validation but I already knew that. I use cross-validation all the time. What I was able to take away from your response is that there is no advantage to having a single dev set except for speed (fewer training runs). Is that correct?

The thrust of my question had to do with overfitting. If you iterate on an algorithm for months, even if the algorithm doesn’t train on the dev set (or if you use cross-validation), you are manually making choices that cause the algorithm to work well on the dev set and hence you have the potential to overfit to it (by human decision-making or parameter tuning). My question was whether a single dev set has any advantages with respect to CV in this regard. In particular, if I am doing CV, do I still have to have a separate dev set that is not part of the CV to avoid the overfitting of human choices to the the hold-out sets in the CV. I understood you to be saying “no”, that CV fulfills the role of the dev set and the test set will reveal whether overfitting to the dev set by human decisions occurred.

1 Like