Is implementation of xgboost as easy as it is in the lecture

hi everybody,
Andrew argues that xgboost is the best possible model for structured data and goes on to say that people have won Kaggle competitions through xgboost.

Then he shows the implementation of gxboost as the following:


just 4 lines of code ? :smiley:
and… you are the winner in a Kaggle competion…?
he also argues that the library figures out to regularize the model against overfitting.
so should I just go to Kaggle and write this 4 lines of code ? :slight_smile:

isn’t there anything else to consider as we had for lineer regression or nueral networks ?

Mehmet

The trick to Kaggle competitions is in trying a zillion different combinations of layers and methods until you find the one that gets the highest rankings on that specific task.

Those who have experience can pare down the number of things they try into more likely areas of success.

It’s extremely unlikely you would win a Kaggle competition by using just one instance of any specific method.

hi tmosh,
thank you for your response. Actually my question is not about ‘how to win a competition in kaggle’
sorry for the confusion. Definetely my bad!

I am rather asking what the issues are to consider in building a xgboost model.
Andrew’s lectures, I believe, did not give any hint on that matter as opposed to his discussion of other models. More concretely, do we really not need to regularize it as andrew implies ? dont we check for bias and variance? more or less features in the model? maybe even polynomial features?

I just presume, if xgboost is as powerful to bring you a kaggle win, there has to be more than to it than just a 4 lines of code as appeared in Andrew’s lecture.

warmly, Mehmet

Perhaps someone else with more xgboost experience would like to reply.

1 Like

Hi @mehmet_baki_deniz yes, Xgboost has several parameters that can be tune to improve the model’s performance, we also have learning rate, number of tree and some others. However, most of the improvements came from feature engineering and improve the quality of the data.

So, I would say the other side of the picture is the quality of the data that you use for your model and some hyperparameter tunning.

1 Like