Predictions using XGBoost

My understanding is that when we’re using XGBoost (in this case Regression Trees), every time that we have new features (X_new) on which we need to make predictions, and once we have made those predictions. Should we make the iterative loop of ML development? What I mean by that is taking those new examples as a result of the predictions and including them into the data set, separating that data set into training and validation sets (X_train, X_valid, y_train, y_valid), and training the model again for the next predictions.

Is that approach correct?

Are you asking about new features or new samples?

You’re right, I mean new samples.

Alright, if the performance of your prediction on the new samples is not degraded, you may not need to retrain your model, but if the performance degrades, then you may need to start with an error analysis and the ML development cycle afterwards. Does it make sense?

Does degrades occurs when Mean Absolute Error is greater than the initial one?

When you train your model, you should have defined a metrics that measure the goodness of your model, and you probably should use the same metric instead of just any one to determine whether the model prediction is degraded or not.

If MAE is what you used as the metric when training, then yes. If not, then please change to use that metric. A good model does not mean any possible metric anyone in the world can think of is at its best value, which means that your model can be good at metric 1 but not that good at metric 2. So we really need to stick with the metric we decided for our model in the first place.

1 Like