FileBasedSource vs. beam.PTransform in ParseSDF class

arvyzukai · October 20, 2023, 9:57am

Hi,

In Week 2 of Course 4, there is “Ungraded Lab (Optional): ETL Pipelines and Batch Predictions with Apache Beam and Tensorflow” colab (C4_W2_Lab_4_Apache_Beam_and_Tensorflow.ipynb).

In the “Applying element-based transforms” sections I find this explanation confusing:

Looking at the code, the ParseSDF class extends the beam.PTransform and not the FileBasedSource. Also, the method implemented is expand() and not the read_records().

The documentation of FileBasedSource does have the read_records method and the explanation would make sense if the beam.PTransform class would use it deep down under the hood (because as far as I understand the PTransform transforms input PCollections into output PCollections, where PCollection represents a collection of data and could be based on FileBaseSource. But I cannot find any relation in documentation or in code of the PCollection.

It took me a lot of time/exploration… Is this a mistake or can someone explain the relation? Or am I not seeing the obvious?

Thank you

balaji.ambresh · October 20, 2023, 5:47pm

Please wait for the staff / other mentors to reply to your questions.

Topic		Replies	Views
C2W4 Feature Engineering code for Weather data and Accelerometer data Machine Learning Data Lifecycle in Production	4	545	May 17, 2023
Apache Beam vs TFX Transform :muscle:t4: Machine Learning Data Lifecycle in Production week-2	3	73	July 6, 2024
C4_W2_C4_W2_Lab_4_Apache_Beam_and_Tensorflow.ipynb Deploying Machine Learning Models in Production	3	411	August 25, 2023
C2W3_Assignment ,Exercise 11: Transform Machine Learning Data Lifecycle in Production	4	574	January 13, 2023
Split_data,, can't multiply sequence by non-int of type 'float' Convolutional Neural Networks in TensorFlow week-1	6	597	August 21, 2022

FileBasedSource vs. beam.PTransform in ParseSDF class

Related topics