In week 4’s optional lab titled “Tree Ensemble”, it was mentioned that
The model is returned at its last state when training terminated, not its state during the best round. For example, if the model stops at round 26, but the best round was 16, the model’s training state at round 26 is returned, not round 16.
Note that this is different from returning the model’s “best” state (from when the evaluation metric was the lowest).
It seems to me that here we are choosing a state where the loss is higher and hence, lower perfomance, instead of the state at round 16 which has a lower loss. Is my understanding correct? And what’s the rationale behind implementing in this way?
If the model stops at round 26, this means the model contains 26 trees, and when we make prediction, we can choose to do it with only the first 16 trees and neglect the last 10 - in this way we are choosing to use the best state.