C2_W3_Lab2: Schema URI mismatch

I’m seeing an unexpected uri for the updated schema and was wondering why this is.

We have set the directory for the updated schema with the following:

# Declare the path to the updated schema directory
_updated_schema_dir = f'{_pipeline_root}/updated_schema'

# Declare the path to the schema file
schema_file = os.path.join(_updated_schema_dir, 'schema.pbtxt')

Resulting in the file path being ./pipeline//updated_schema/schema.pbtxt

But when we print the schema info later in the notebook, reading from the MLMD store with this code:

# Get artifact types
schema_list = store.get_artifacts_by_type('Schema')

[(f'schema uri: {schema.uri}', f'schema id:{schema.id}') for schema in schema_list]

It shows the following as the uri for the updated schema:
('schema uri: ./pipeline/ImportSchemaGen/schema/4', 'schema id:4')

So I’m confused as the what happened to the updated_schema subdirectory?
In fact, it seems like we explicitly told the pipeline to use the full file path (all the way to the name of the file) that seems to have been ignored by the pipeline.

Can someone help me understand why we are seeing this difference?

Hi Tigran! In Exercise 7, you used the ImportSchemaGen component to import the schema_file into the pipeline. That file is saved in the directory structure employed by TFX and the component generates an artifact that includes the location of the saved file. That is what you see when you scan the artifacts saved in the MLMD store. Hope this helps!

Thank you @chris.favila!
I see, the schema file is used to load it but later is saved/managed in the TFX directory structure.

1 Like