Hi!
On the module “Generate a constant graph with the required transformations” there is a line defines as: "
# analyze and transform the dataset using the preprocessing function
(raw_data, raw_data_metadata) | tft_beam.AnalyzeAndTransformDataset(
preprocessing_fn)
)"
My question: What is the difference between “raw_data” and “raw_data_metadata” if the preprocessing occurs afterwards. Is the “raw_data_metadata” where the transformations from the preprocessing are stored?
Hi Micaela! The raw_data_metadata
contains the types of each feature column in raw_data
. In this particular case, it is defined in one of the earlier cells and it states that two are floats and one is a string:
# define the schema as a DatasetMetadata object
raw_data_metadata = dataset_metadata.DatasetMetadata(
# use convenience function to build a Schema protobuf
schema_utils.schema_from_feature_spec({
# define a dictionary mapping the keys to its feature spec type
'y': tf.io.FixedLenFeature([], tf.float32),
'x': tf.io.FixedLenFeature([], tf.float32),
's': tf.io.FixedLenFeature([], tf.string),
}))
Hope this helps!