how do we understand if a given rsme or mae value is good enough given there is no overfitting?
I trained a catboost regorssor model and received 2 mae values for both the test and the train sets which suggests the model doesnt overfit. all right. but what if my model is prone to underfitting? how do you understand that given that I am the only person on earth who trained over this data set althought the specific regression problem is well studied both in the academia and the industy.
Interesting question. If you are the only person on earth who trained a particular model, then compare its performance with humans. If its performance is poor than humans, it means it is a poor model. And vice versa. But this trick also depends on the nature of the problem. Wanna share more about your specific problem?
thank you vert-y much for your response Saif. I also suspected best way would be comparing it with human performance. Actually my problem is a very trivial one: predicting sales of over 300 food items in over 500 different retail stores. the problem is well studied and applied by various academics and applied in the industry by firms such as Amazon or the like.
an example data set and a problem frame would be a past kaggle competition:
while the problem is a trivial one, there is no one else on earth who studied with my 12 M row dataset from a famous turkish retailer who hired me to predict their future sales.
so I applied catboost algoritm for the existence of categorical variables and obtained 2.17 rmse. I will try gridsearch to have a lower score and maybe apply some other models to see whether a better score is possible. Yet, how can I mathematically show that my prediction is good enough?
if this is not possible, then maybe I should create best possible score model under my capabilities, and tell the executives to ask their inventory management personell to test it with their own manual predictions.
You already show mathematically (RSME and MAE) that your model is doing good. It is also a good idea to check some other models as well. Moreover, It is also a good practice to visualize the model predictions against the true values using plots like scatter plots. Usually, visualization convinces stakeholders better than numbers.
Furthermore, by using Python or any other language, you can create a table of two columns that show the actual and predicted values side by side and then tell the executives to ask their management to check for each value manually. Actually, you can also add a third column that shows the difference between actual and predicted values. And, you can also do some formatting like if a difference is greater than 10, highlight that cell with red color.
Below is the table which I created for my model that shows the first 5 and bottom 5 values of test data side by side.
Are you suggesting that the numerical value is meaningful? That 2.17 rmse for this model and dataset can be compared to rmse from a different model and dataset? I thought that was the original question, but doesn’t the numerical value depend on the scale of the dataset?
Yeah, I think that is the salient point. The answer to the thread title question, ‘What is a good RMSE value?’ is nuanced. You can only compare values against the same dataset preprocessed the same way. You can compare whether a change in hyperparameter improved RMSE relative to a previous run. You cannot compare RMSE from your model running on retail data with my RMSE running object detection on road scenes.
hi I asked the same question to chatgpt.
I said given my model’s rmse is 1.97 for a target with 3.2 standard deviation, is it good enough? the answer is:
If your model’s RMSE is significantly lower than the standard deviation of the target variable, then it may be considered a good model. On the other hand, if your model’s RMSE is only slightly lower than the standard deviation, then it may not be performing well enough for your needs.
then I asked if 1.97 is significantly lower than 3.2, it said yes. it seems it is good enough.
What do you guys think about comparing rmse values with standard deviation or any other summary statistics for the purposes of determining if the model is underfitting?
I haven’t come across any information on evaluating model performance by comparing RMSE and standard deviation. I’m curious to hear others’ thoughts on this. Let’s wait and see what other folks have to say.