Summary of Course 2

I enjoyed the course 2 on MLOps which introduced various pre-processing and elements of Tensor flow like Example Gen (splitting the data into training and evaluation sets), Statistics Gen to evaluate various statistical relationships like mean, standard deviation and Schema Gen to get types of features. And then getting rid of features that are correlated through various techniques on feature selection.

However, I am a bit lost from the point of view that, how the the above concepts related to a production system (which being the main highlight of the course). The only point I got from the course of week 2 was that we were making a schema for the serving set, we get rid of the label as we do not know what is the label in a serving set. The meta data library of tensot flow helps debug any step/artifact / component but that is not only limited to a production system, isn’t it?

Can anyone please help understanding how the course 2 is relevant to production system succinctly? I shall be extremely grateful

Hello @ajaytaneja,
Thanks for your kind feedback on the course!
I believe it is very hard to succinctly describe the function of all components of the TFX pipeline and the way they interact, but personally I find the description in the following page downwards quite helpful to get a sense of how the whole end-to-end system interacts:

https://www.tensorflow.org/tfx/guide#tfx_standard_components

Please note that in Course 2 you have only gone into details in the first components of the pipeline, but not much into the training and serving part which will come in course 3 & 4. If you take these 2 courses it will become more clear what the functionalities of the first pipeline components are in the production system and why it is setup in this way.

Hopefully this helps answering your question.

Good luck and best regards,
MAarten

Thank you very much and it is useful to me. I personally would say (please advise your point of view) that the components in a TFX pipeline are useful even if a I’m doing a proof of concept in a jupyter notebook. Isn’t that true? Why are we stressing that the components of the TFX pipeline are only meant for production systems?

Thanks a lot for your help