I have a question on how we pick the max depth in Lab 2 Week 4. In the accuracy chart we can see after max_depth > 4 the train and validation accuracy diverge quite a bit. Despite that, in the lab we pick max_depth of 16 (because that’s where it reaches its max).
I thought the big gap between the accuracy of validation and train would indicate overfit, so we would stick to max_depth of 4. Is it a bit of a judgement call or is there another reason to go for 16?
that’s because we search at the split point where the accuracy in the training and test(validation) is high and close in value so that we guarantee it does not happen overfitting
Thanks for the quick response, Abdelrahman. You mentioned “high and close in value”. In the example you shared, the accuracy is 0.875 for validation and 0.878 for train - which is indeed close in value. But for the example I shared, the accuracy is 0.89 for validation and 1.00 for train. Can that still be considered close in value?
Sorry if it seems like I’m nitpicking, I just want to make sure I can make the right call when building my own models