hi
how do we understand if a given rsme or mae value is good enough given there is no overfitting?
I trained a catboost regorssor model and received 2 mae values for both the test and the train sets which suggests the model doesnt overfit. all right. but what if my model is prone to underfitting? how do you understand that given that I am the only person on earth who trained over this data set althought the specific regression problem is well studied both in the academia and the industy.
Interesting question. If you are the only person on earth who trained a particular model, then compare its performance with humans. If its performance is poor than humans, it means it is a poor model. And vice versa. But this trick also depends on the nature of the problem. Wanna share more about your specific problem?
thank you vert-y much for your response Saif. I also suspected best way would be comparing it with human performance. Actually my problem is a very trivial one: predicting sales of over 300 food items in over 500 different retail stores. the problem is well studied and applied by various academics and applied in the industry by firms such as Amazon or the like.
an example data set and a problem frame would be a past kaggle competition:
while the problem is a trivial one, there is no one else on earth who studied with my 12 M row dataset from a famous turkish retailer who hired me to predict their future sales.
so I applied catboost algoritm for the existence of categorical variables and obtained 2.17 rmse. I will try gridsearch to have a lower score and maybe apply some other models to see whether a better score is possible. Yet, how can I mathematically show that my prediction is good enough?
if this is not possible, then maybe I should create best possible score model under my capabilities, and tell the executives to ask their inventory management personell to test it with their own manual predictions.
You already show mathematically (RSME and MAE) that your model is doing good. It is also a good idea to check some other models as well. Moreover, It is also a good practice to visualize the model predictions against the true values using plots like scatter plots. Usually, visualization convinces stakeholders better than numbers.
Furthermore, by using Python or any other language, you can create a table of two columns that show the actual and predicted values side by side and then tell the executives to ask their management to check for each value manually. Actually, you can also add a third column that shows the difference between actual and predicted values. And, you can also do some formatting like if a difference is greater than 10, highlight that cell with red color.
Below is the table which I created for my model that shows the first 5 and bottom 5 values of test data side by side.
Are you suggesting that the numerical value is meaningful? That 2.17 rmse for this model and dataset can be compared to rmse from a different model and dataset? I thought that was the original question, but doesnât the numerical value depend on the scale of the dataset?
I am only suggesting that RSME and MAE are the mathematical way to show whether a model is performing well or not. You know, 2.17 RMSE can be good or bad, relative to any other modelâs performance.
I also suggested that to check other models as well (with the same dataset) and compare which one performs better.
Yeah, I think that is the salient point. The answer to the thread title question, âWhat is a good RMSE value?â is nuanced. You can only compare values against the same dataset preprocessed the same way. You can compare whether a change in hyperparameter improved RMSE relative to a previous run. You cannot compare RMSE from your model running on retail data with my RMSE running object detection on road scenes.
So if your model would be really underfitting but based on you residual analysis you would find out:
no patterns or correlation between features and residuum
from business perspective the residuals (represented with a histogram or so) are sufficiently good with respect to a performance metric (e.g. a certain quantile or worse case error)
then you can conclude that your model is âacceptableâ.
But there are ways to do better of course: on this note you can check this thread to get some inspiration on how to tackle underfitting: Training set error? - #2 by Christian_Simonis
hi I asked the same question to chatgpt.
I said given my modelâs rmse is 1.97 for a target with 3.2 standard deviation, is it good enough? the answer is:
If your modelâs RMSE is significantly lower than the standard deviation of the target variable, then it may be considered a good model. On the other hand, if your modelâs RMSE is only slightly lower than the standard deviation, then it may not be performing well enough for your needs.
then I asked if 1.97 is significantly lower than 3.2, it said yes. it seems it is good enough.
What do you guys think about comparing rmse values with standard deviation or any other summary statistics for the purposes of determining if the model is underfitting?
I havenât come across any information on evaluating model performance by comparing RMSE and standard deviation. Iâm curious to hear othersâ thoughts on this. Letâs wait and see what other folks have to say.