Hi,
In Week 2 of Course 4, there is “Ungraded Lab (Optional): ETL Pipelines and Batch Predictions with Apache Beam and Tensorflow” colab (C4_W2_Lab_4_Apache_Beam_and_Tensorflow.ipynb
).
In the “Applying element-based transforms” sections I find this explanation confusing:
Looking at the code, the ParseSDF
class extends the beam.PTransform
and not the FileBasedSource
. Also, the method implemented is expand()
and not the read_records()
.
The documentation of FileBasedSource does have the read_records
method and the explanation would make sense if the beam.PTransform
class would use it deep down under the hood (because as far as I understand the PTransform transforms input PCollections into output PCollections, where PCollection represents a collection of data and could be based on FileBaseSource. But I cannot find any relation in documentation or in code of the PCollection.
It took me a lot of time/exploration… Is this a mistake or can someone explain the relation? Or am I not seeing the obvious?
Thank you