FileBasedSource vs. beam.PTransform in ParseSDF class


In Week 2 of Course 4, there is “Ungraded Lab (Optional): ETL Pipelines and Batch Predictions with Apache Beam and Tensorflow” colab (C4_W2_Lab_4_Apache_Beam_and_Tensorflow.ipynb).

In the “Applying element-based transforms” sections I find this explanation confusing:

Looking at the code, the ParseSDF class extends the beam.PTransform and not the FileBasedSource. Also, the method implemented is expand() and not the read_records().

The documentation of FileBasedSource does have the read_records method and the explanation would make sense if the beam.PTransform class would use it deep down under the hood (because as far as I understand the PTransform transforms input PCollections into output PCollections, where PCollection represents a collection of data and could be based on FileBaseSource. But I cannot find any relation in documentation or in code of the PCollection.

It took me a lot of time/exploration… :slight_smile: Is this a mistake or can someone explain the relation? Or am I not seeing the obvious? :slight_smile:

Thank you

1 Like

Please wait for the staff / other mentors to reply to your questions.

1 Like