Why does my AI model perform well on training data but poorly on new data?

Hi everyone,

I am currently learning AI and practicing by training simple models. I noticed that my model gives good results during training, but when I test it on new or unseen data, the performance drops.

I am not fully understanding why this happens. Is this something common in machine learning?

I would really appreciate if someone could explain:

  • Why does this happen?

  • What mistakes do beginners usually make?

  • What simple steps can I take to improve performance on new data?

I am still a beginner, so simple explanations would really help me.

Thank you :blush:

1 Like

What course are you attending? This is a pretty fundamental issue that is discussed in most courses.

1 Like

we need more information,you can describe it more specific

Yes, this is a very common problem, as Tom said. There are lots of ways this can happen, but the usual one for beginners is just that your training dataset is too small and not a realistic representation of all the possible inputs your model needs to handle.

This is not a simple topic and rather than have us try to reproduce the information here. It would be better if you would take the Deep Learning Specialization (DLS) and learn all that Professor Andrew Ng has to say on this and related topics. He covers all this in a lot of depth and your time will be well spent. Start with DLS Course 1 and Course 2 and that should help with your understanding of the issues. Professor Ng continues to add more information in C3, C4 and C5.