How big a difference indicates overfitting?

mehmet_baki_deniz · March 31, 2023, 9:55am

hi everybody,

as a rule of thumb, we know that a big difference between the validation and training set error scores indicates the model overfits the data. Yet how big is really big? if it depends, what does it depend?
for instance, for a model that I have been working for some time, I keep on getting 2.1 rmse score for the training set and 3.04 for the validation set. so the validation set error is %50 higher than the training error. Does this indicate that my model overfits the data?

Mehmet

p.s for time complexity purposes, I had previously used a subset of the data and got 3.4 rmse for the validation set. I then fed all the data to the model and got 3.04 error. So more data really helped but I am out of any more data

saifkhanengr · March 31, 2023, 10:01am

Hi @mehmet_baki_deniz! I hope you are doing well.

I use accuracy as a matrix to measure whether a model is overfitting or not. As there is no one answer, I use a threshold of 5%. If training accuracy is 95% and testing is 90%, I count it a good model. But if training accuracy is 95% and testing is 89%, I will try to improve model performance.

Again, there is no one answer. The nature of the problem is also a major factor in setting a threshold. However, I use that 5% threshold for my problems.

Best,
Saif.

mehmet_baki_deniz · March 31, 2023, 10:21am

hi saif,
thank you for your response.
as this is a regression problem, I use rmse which is not an accuracy based metric.

but would you use MAPE for regression problems so that you can play with percentages?
but I also understand that rmse is more widely used for model evaluation

saifkhanengr · March 31, 2023, 10:26am

I use below code for my regression problems to find error and accuracy, where AL is the predicted value and Y is the actual value:

error = (np.abs(AL - Y))/(np.abs(Y))
avg_error = np.mean(error)
percent_error = avg_error*100
print(f"error is {percent_error}%")
accuracy = (1 - avg_error)*100
print(f"accuracy is {accuracy}%")

Best,
Saif.

Topic		Replies	Views
Overfitting decision AI Discussions	8	53	April 29, 2023
Ouverffiting and underfitting Advanced Learning Algorithms week-1	2	497	November 5, 2022
Question about Course 2 Week 1 Quiz Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	672	January 17, 2023
Regarding Overfitting Introduction to TF for Artificial Intelligence ... week-3	4	514	August 12, 2022
When to say we are overfitting the dev set? Structuring Machine Learning Projects coursera-platform	6	656	October 12, 2022

How big a difference indicates overfitting?

Related topics