C2_W3_Assignment Exercise 11: Transform Error

CASE1

Instantiate the Transform component

transform = Transform(
examples=example_gen.outputs[‘examples’],
schema=schema_gen.outputs[‘schema’],
module_file=os.path.abspath(_census_transform_module_file))

END CODE HERE

Run the component

context.run(transform)


NameError Traceback (most recent call last)
in
4 examples=example_gen.outputs[‘examples’],
5 schema=schema_gen.outputs[‘schema’],
----> 6 module_file=os.path.abspath(_census_transform_module_file))
7
8

NameError: name ‘_census_transform_module_file’ is not defined

CASE2

Instantiate the Transform component

transform = Transform(
examples=example_gen.outputs[‘examples’],
schema=schema_gen.outputs[‘schema’])

END CODE HERE

Run the component

context.run(transform, enable_cache=False)


ValueError Traceback (most recent call last)
in
1 ### START CODE HERE ###
2 # Instantiate the Transform component
----> 3 transform = Transform(
4 examples=example_gen.outputs[‘examples’],
5 schema=schema_gen.outputs[‘schema’])

/opt/conda/lib/python3.8/site-packages/tfx/components/transform/component.py in init(self, examples, schema, module_file, preprocessing_fn, splits_config, transform_graph, transformed_examples, input_data, analyzer_cache, instance_name, materialize, disable_analyzer_cache, custom_config)
152 examples = input_data
153 if bool(module_file) == bool(preprocessing_fn):
→ 154 raise ValueError(
155 “Exactly one of ‘module_file’ or ‘preprocessing_fn’ must be supplied.”
156 )

ValueError: Exactly one of ‘module_file’ or ‘preprocessing_fn’ must be supplied.

Hi @jschoi

Try changing the name of your module_file to “_cover_transform_module_file”.
I’m assuming you have kept the files default name, and that this is a copy of a code snippet from another file.

Let me know if that doesn’t help.

Chris

@Chris
I have tested 6 cases, but all cases are occured errors !

transform = Transform(
examples=example_gen.outputs[‘examples’],
schema=schema_gen.outputs[‘schema’],
module_file=os.path.abspath(_cover_transform_module_file))

module_file=os.path.abspath(_cover_transform_module_file))
module_file=os.path.abspath(“_cover_transform_module_file”))
module_file=os.path.abspath(‘_cover_transform_module_file’))
module_file=”_cover_transform_module_file”)
module_file=’_cover_transform_module_file’)
module_file=_cover_transform_module_file)

@jschoi

The last option should work. Eg:

transform = Transform(
examples = example_gen.outputs[‘examples’],
schema = schema_gen.outputs[‘schema’],
module_file = _cover_transform_module_file
)

Can you supply the error output when you run this?

Also, perhaps you need to rerun the previous code blocks?

One thing to consider…

The schema passed did not appear to be the “curated” schema per the instructions of Exercise 11.

transform = Transform(
examples = example_gen.outputs[‘examples’],
schema = schema_gen.outputs[‘schema’],
module_file = _cover_transform_module_file
)

context.run(transform, enable_cache=False)


AttributeError Traceback (most recent call last)
in
10
11 # Run the component
—> 12 context.run(transform, enable_cache=False)

/opt/conda/lib/python3.8/site-packages/tfx/orchestration/experimental/interactive/interactive_context.py in run_if_ipython(*args, **kwargs)
65 # IPYTHON variable is set by IPython, see
66 # IPython reference — IPython 0.10.2 documentation.
—> 67 return fn(*args, **kwargs)
68 else:
69 absl.logging.warning(

/opt/conda/lib/python3.8/site-packages/tfx/orchestration/experimental/interactive/interactive_context.py in run(self, component, enable_cache, beam_pipeline_args)
180 telemetry_utils.LABEL_TFX_RUNNER: runner_label,
181 }):
→ 182 execution_id = launcher.launch().execution_id
183
184 return execution_result.ExecutionResult(

/opt/conda/lib/python3.8/site-packages/tfx/orchestration/launcher/base_component_launcher.py in launch(self)
200 absl.logging.info(‘Running executor for %s’,
201 self._component_info.component_id)
→ 202 self._run_executor(execution_decision.execution_id,
203 execution_decision.input_dict,
204 execution_decision.output_dict,

/opt/conda/lib/python3.8/site-packages/tfx/orchestration/launcher/in_process_component_launcher.py in _run_executor(self, execution_id, input_dict, output_dict, exec_properties)
65 executor_context) # type: ignore
66
—> 67 executor.Do(input_dict, output_dict, exec_properties)

/opt/conda/lib/python3.8/site-packages/tfx/components/transform/executor.py in Do(self, input_dict, output_dict, exec_properties)
415 label_outputs[labels.CACHE_OUTPUT_PATH_LABEL] = cache_output
416 status_file = ‘status_file’ # Unused
→ 417 self.Transform(label_inputs, label_outputs, status_file)
418 absl.logging.debug(‘Cleaning up temp path %s on executor success’,
419 temp_path)

/opt/conda/lib/python3.8/site-packages/tfx/components/transform/executor.py in Transform(failed resolving arguments)
912 # Inspecting the preprocessing_fn even if we know we need a full pass in
913 # order to fail faster if it fails.
→ 914 analyze_input_columns = tft.get_analyze_input_columns(
915 preprocessing_fn, typespecs)
916

/opt/conda/lib/python3.8/site-packages/tensorflow_transform/inspect_preprocessing_fn.py in get_analyze_input_columns(preprocessing_fn, specs)
56 input_signature = impl_helper.batched_placeholders_from_specs(
57 specs)
—> 58 _ = preprocessing_fn(input_signature.copy())
59
60 tensor_sinks = graph.get_collection(analyzer_nodes.TENSOR_REPLACEMENTS)

~/work/cover_transform.py in preprocessing_fn(inputs)
28 # Transform using scaling of 0 to 1 function
29 # Hint: tft.scale_to_0_1
—> 30 features_dict[_transformed_name(feature)] = tft. tft.scale_to_0_1(data_col)
31
32 for feature in _SCALE_Z_FEATURE_KEYS:

AttributeError: module ‘tensorflow_transform’ has no attribute ‘tft’

Get the URI of the output artifact representing the transformed examples

train_uri = os.path.join(transform_uri, ‘train’)

Get the list of files in this directory (all compressed TFRecord files)

tfrecord_filenames = [os.path.join(train_uri, name)
for name in os.listdir(train_uri)]

Create a TFRecordDataset to read these files

transformed_dataset = tf.data.TFRecordDataset(tfrecord_filenames, compression_type=“GZIP”)

FileNotFoundError Traceback (most recent call last)
in
4 # Get the list of files in this directory (all compressed TFRecord files)
5 tfrecord_filenames = [os.path.join(train_uri, name)
----> 6 for name in os.listdir(train_uri)]
7
8 # Create a TFRecordDataset to read these files

FileNotFoundError: [Errno 2] No such file or directory: ‘./pipeline/Transform/transformed_examples/106/train’

import helper function to get examples from the dataset

from util import get_records

Get 3 records from the dataset

sample_records_xf = get_records(transformed_dataset, 3)

Print the output

pp.pprint(sample_records_xf)

NameError Traceback (most recent call last)
in
3
4 # Get 3 records from the dataset
----> 5 sample_records_xf = get_records(transformed_dataset, 3)
6
7 # Print the output

NameError: name ‘transformed_dataset’ is not defined

Hi @jschoi

The key part of the error output is this:

FileNotFoundError: [Errno 2] No such file or directory: ‘./pipeline/Transform/transformed_examples/106/train’

This is telling us that there is no transform output. Do you get a similar output the below image after running your your transform in context.run(transform, enable_cache=False)?

If not, can you please send me a snapshot of what is being returned?
There may then be an issue with your _cover_transform_module_file.
Can you please send me the full code from this file?