TFX pipelines and CT Methodology

So the whole specialization has been at a high level about developing CI/CD/CT knowledge and at a low level, using GCP and TFX to achieve this.

I understand the importance of developing good pipelines and I know how we can use TFX to ingest and transform data, train a model and deploy the model.

My question is so what now? Once we have this pipeline in some scripts somewhere do we just schedule it to run every however often? For example every month run the script which ingests all the new data, trains a model, validates it and pushes it? Is that all it is? Or am I missing a key part?

I understand of course as an MLOps Engineer you would be developing different ETL pipelines and constantly exploring different models but I feel that is kind of separate to gist of this course.

Does anyone have any practical experience that they could share to help be build a better idea?


How often do you run a pipeline?
This boils down to the project, budget and the need for running the pipeline. If we consider a machine learning system for an application that predicts if a person has an infection and its deployed for something new like covid where the distribution of dataset keeps changing and odds of a true negative (calling a sick patient healthy) is high, I’d say often. On the other hand, if we consider a house prediction model, it’s going to be a lot more infrequent (assuming that market is stable atleast for a new months).
You’ve got the rest of the steps regarding the workflow right.