Overfitting decision

IZZETTIN_ALHALIL · April 27, 2023, 3:55pm

How to say that my ML model has overfitting?
what is the difference magnitude to decide that?

if my test R2 is 0.92 and train is 0.98
or test is 0.88 and train is 0.95

in the above examples I can only reduce the train quality to get smaller difference, and Also, i am trying to increase example cases number, but no change.
In fact, I am satisfied with these results according to my knowledge upon my problem, however, since i am trying to present the results as journal paper, I want to be confidence from this side.
is there any other way or suggestions to decide that?

saifkhanengr · April 27, 2023, 4:16pm

Hi,

In this type of situation, I will check what is the human level of accuracy. If train accuracy is 98%, human is 95%, and test accuracy is 92%, then I am convinced that my model is overfitting and not a good fit to deploy into production.

However, if train accuracy is 98%, human is 92%, and test accuracy is 95%, then I will say that my model is OK (if not good). Generally, I consider 5% difference between train and test to be OK. But this is not true all the time. Nature of problem matters.

Best,
Saif.

IZZETTIN_ALHALIL · April 27, 2023, 7:31pm

human accuracy is not applicable in my problem case
I am still have doubt about how to say that a model is overfitting the data.
you mentioned a threshold of 5% difference according to your experience, this mean if the ratio is 6 or 7%, it is not accepted.
Cannot we say that a DIFFERENCE smaller than 5% would be perfect, between 6-8 acceptable, and larger than 8 is bad.

IZZETTIN_ALHALIL · April 27, 2023, 7:40pm

Totally agree, so what about if my regression problem consist of 12 inputs and 3 outputs, with 800 examples, and after applying many algorithms, I found that among those three parameters, the first one is predictable very good, while the second is rather bad and the third is almost ok.
Also, the relation between the 12 input parameters and the outputs is not same, like semi-linear for the first, nonlinear for the second and third output parameters. the significance is different as well.

Do I need to reduce train quality by make the model less complex or adding more and more examples to avoid overfitting?

TMosh · April 27, 2023, 9:52pm

It’s a value judgement that you make based on experimentation and your experience.

saifkhanengr · April 28, 2023, 4:44am

There is no one fit answer to this question. Sometimes even a 5% difference cannot be acceptable.

If human accuracy is not applicable to your problem, then you may be working on an Engineering related problem. Can you define your problem?

As Tom said, “It’s a value judgement that you make based on experimentation and your experience” you may try this. Tweak your model’s architecture and data. Please update us with your findings.

In the last, if you are solving an engineering-related problem, then you must have domain knowledge of that field.

IZZETTIN_ALHALIL · April 29, 2023, 1:29pm

it is a model to predict characteristic of multi-story buildings due to effect of earthquake such as maximum displacement and period of vibration

saifkhanengr · April 29, 2023, 1:34pm

So, it’s an Engineering related problem. In this type of problem, it is difficult for “only an AI expert” to tell you whether that 5% difference is good or not. Kindly seek the guidance of someone who is a “Domain + AI expert”. Maybe a professor/professional at the civil engineering department whose research area is AI in construction.

IZZETTIN_ALHALIL · April 29, 2023, 2:18pm

well understood
thank you

Topic		Replies	Views
Regarding Overfitting Introduction to TF for Artificial Intelligence ... week-3	4	514	August 12, 2022
How big a difference indicates overfitting? AI Discussions	3	49	March 31, 2023
W4_Overfitting of the Model vs Training Accuracy Neural Networks and Deep Learning	6	593	April 8, 2023
Should I sacrifice final accuracy score to avoid overfitting? Structuring Machine Learning Projects	1	553	February 17, 2022
Machine Learning Strategies AI Discussions ai-discussions , data-centric	13	459	March 19, 2024

Overfitting decision

Related topics