In Week 2 of Course 4, there is “Ungraded Lab (Optional): ETL Pipelines and Batch Predictions with Apache Beam and Tensorflow” colab (
In the “Applying element-based transforms” sections I find this explanation confusing:
Looking at the code, the
ParseSDF class extends the
beam.PTransform and not the
FileBasedSource. Also, the method implemented is
expand() and not the
The documentation of FileBasedSource does have the
read_records method and the explanation would make sense if the
beam.PTransform class would use it deep down under the hood (because as far as I understand the PTransform transforms input PCollections into output PCollections, where PCollection represents a collection of data and could be based on FileBaseSource. But I cannot find any relation in documentation or in code of the PCollection.
It took me a lot of time/exploration… Is this a mistake or can someone explain the relation? Or am I not seeing the obvious?