Should I prioritize improving the AUC of the training set or the validation set?

I’m training a product recommendation model to predict whether users will click on products. The AUC of the model on the training set is 70%, and on the validation set is 57%. In this case, should I prioritize improving the AUC of the training set or the validation set? I think the AUC of the training set can reach 80% under normal circumstances.

Your job is to fit the data on the training set and maximise the AUC on the validation set.

1 Like

Which one should be done first?

We don’t really care about maximizing the performance on the training set - we already have all those labels.

Making predictions on the validation set (and the test set) are what we’re interested in - because those are simulations of new data that wasn’t used in training.

Thank you for your reply. However, I’m worried that since my training set hasn’t been fully fitted yet, the model might start optimizing the performance of the validation set before learning enough content. Could this be in vain?

Do not worry about this.

There is an iterative process between training the model, evaluating the model on the validation set, adjusting the model’s setup, and training again.