Question about training and re-training

Training of models needs people specialized in the subject of the model to feed the model with good training data. For instance, in a model that predicts diseases from X-rays, the model will need training data pre-processed by experts in this subject.

How about retraining? is retraining a task that can be automated? or there will always be the need of people expert in the subject to maintain a model properly tuned? Or it depends on the model?

And this leads to a common question: Will ML eliminate jobs? Or new categories of jobs will appear, like people specialized in training and maintaining ML models?

I am not sure if this question fits in this course.

Thank you!

Juan

Hi, @Juan_Olano !

You are right, retraining can be automated. That is actually kind of mandatory when dealing with real-time predictions for, say, energy generation or stock prices. In those cases, predictions have a strong dependency on previous results so regular retraining is advisable.

In my view, AI/DL era will affect job market as any other “technological revolution”. Some jobs will dissapear, some will emerge and a lot will experiment some kind of change. That’s why some platforms like DeepLearning.AI are so useful for everyone that wants to get the grasp of the basics underneath this technology and start a career in this field.

@alvaroramajo thank you for the reply!

Regarding re-training, could there be cases where these cannot be automated? I am particularly thinking about the case of medical diagnostics based on X-rays: Can new data distributions be taught automatically to the models? or may be some cases like this one will always require humans that label new cases before re-training a model?

Thanks!

Juan

I think we have different concepts of what automated retraining is.
Do you mean the model autolabels new data and then retrain with those new examples? If so, the model is not going to learn any additional information about the new data the same way a person cannot be the teacher and student at the same time.

You will always need human labels as a source of ground truth to build knowledge upon the new data distribution.

Good day @alvaroramajo , Thank you for taking the time to share your knowledge.

It is possible that we have different concepts of what automated retraining is, yes. I’ll investigate more about this.

In my head, retraining with new data sets requires those data sets to be labeled so that the model learns new things. I would never expect the model to auto-label, but instead we need human experts to label new data. In that sense, I think that “automatic retraining” cannot be possible as there’s a manual step (labeling).

Being a bit abusive: if you have a link to an article where automatic retraining, or the general concept of training, is discussed, I’d appreciate it.

Thanks!

Juan

In my head, retraining with new data sets requires those data sets to be labeled so that the model learns new things […]. In that sense, I think that “automatic retraining” cannot be possible as there’s a manual step (labeling).

Exactly. In supervised learning you will always need those labels, and the only* possible way to be sure about the validity of them is manual labelling. (*) If the data is extracted from, say, a simulated environment, you already know the ground truth.

Another thing is self-supervised learning, which is the kind of training in which data labels as such are not needed. For instance, the large language model GPT-3 was trained with that principle: they just had a vast text with some words removed and the task was filling the gaps. That’s the reason they used such a absurd amount of data.

I would never expect the model to auto-label, but instead we need human experts to label new data

Coming back to supervised learning, this is a common practice to ease and accelerate the labelling process, as it’s faster to correct the missing or misclassified labels than doing all from scratch.