C2W2 - Simple Feature Engineering

Hi!
On the module “Generate a constant graph with the required transformations” there is a line defines as: "

# analyze and transform the dataset using the preprocessing function
        (raw_data, raw_data_metadata) | tft_beam.AnalyzeAndTransformDataset(
            preprocessing_fn)
    )"

My question: What is the difference between “raw_data” and “raw_data_metadata” if the preprocessing occurs afterwards. Is the “raw_data_metadata” where the transformations from the preprocessing are stored?

Hi Micaela! The raw_data_metadata contains the types of each feature column in raw_data. In this particular case, it is defined in one of the earlier cells and it states that two are floats and one is a string:

# define the schema as a DatasetMetadata object
raw_data_metadata = dataset_metadata.DatasetMetadata(
    
    # use convenience function to build a Schema protobuf
    schema_utils.schema_from_feature_spec({
        
        # define a dictionary mapping the keys to its feature spec type
        'y': tf.io.FixedLenFeature([], tf.float32),
        'x': tf.io.FixedLenFeature([], tf.float32),
        's': tf.io.FixedLenFeature([], tf.string),
    }))

Hope this helps!