Training strategy as more real data becomes available

whnr · May 8, 2021, 6:59pm

Hi!

I was wondering what the typical approach is as more real data from the field becomes available.

Let’s say the model performs OK after training on the initial training data to deploy it into the field.
Now the real world is generating more data and we are busy to check labels on these manually.
How do you integrate this data into the network?

Would I just add it to the training set and keep training from the current best model? My intuition says that new data will lead to slightly different gradients and therefore to the potential to improve the model over time.

–Flo

carloshvp · May 12, 2021, 5:52pm

Hi @whnr ,
I believe you brought really good points for discussion. The concept of checking if the latest most actual data is different from the original data used for training is important and if this is different, it is called data drift. If the new data is different to the original (in case you are able to figure this out), you should either train with only the new data (if radically different) or incorporate the new data to the original dataset and retrain with that.
I hope this makes sense for you?

whnr · May 12, 2021, 9:31pm

Yes! Thank you @carloshvp,

Course 4 / Week 2 had some more examples and the programming assignment about transfer learning (Alpaca recognition). That gave me a good intuition about what is needed or how this could work.

manifest · May 13, 2021, 6:39am

I just add a little to what Carlos said.

If there isn’t any significant concept drift in the new data, you should do well to just continue training your existing model on that new data. The pretraining usually helps learning and results in reduced variance.

In this case, we prefer not to use the original data in our training process because repeating examples introduces bias.

Topic		Replies	Views
Training Data Ideal Approach for Transfer Learning Convolutional Neural Networks in TensorFlow week-3	2	516	January 21, 2023
Question: week 1, steps of an ML project -2.30 min Machine Learning in Production	2	589	May 17, 2021
New 1000 images after model development (train/dev/test), where to add? Structuring Machine Learning Projects	12	715	July 5, 2023
How can transfer learning be applied when using a dataset that is not trainable with the pre-trained model? AI Discussions ai-discussions	2	88	March 6, 2024
Transfer learning why it works? Advanced Learning Algorithms week-3	2	44	February 6, 2025

Training strategy as more real data becomes available

Related topics