Why cross validation

It’s not quite clear why we use cross-validation instead of just performing tests on all models, and the one with the lowest mean cost will be our optimal model. Doing a cross-validation sequence just seems unnecessary.

  • Cross-validation provides a more robust estimate of a model’s performance compared to a single train-test split. It helps in evaluating how well the model generalizes to unseen data.
  • Helps to determine if your model is affected by underfitting or overfitting.
  • Can assist in selecting the best model or parameters for the particular task.
    If you check the optional videos of the skewed dataset [most probably in week 2] , you can realize the importance of cross validation.

Pardon me If I’ve made any mistake.
Thank you :smiley:

In a word, 'Overfitting".

Keep in mind the goal is to get a model that makes good predictions on new data. The goal is not just getting low cost on the training set - that has no value since we already have labels for the training set, so we don’t need predictions there at all.

In addition to @ahs95‘s great reply:

In that case you have quite a risk of Survivorship Bias. By cross validation you basically can test several splits and more variation of data which should usually help to prevent overfitting compared to the scenario you outlined. Hope that helps, @Jules_Gransden.

Best regards
Christian

Here are two threads which I would recommend to take a look at:

Best regards
Christian