Problem in my train_step_fn; W2_C3

Dennis_Sinitsky · February 18, 2024, 12:56am

My model.loss(prediction_dict, true_shape_tensor) does not work. The error is:
ValueError: Shapes must be equal rank, but are 3 and 1 for ‘{{node Loss/Loss/Select}} = Select[T=DT_FLOAT](Loss/Loss/IsNan, concat_1, Loss/stack_2)’ with input shapes: [0], [4,51150,4], [0].

note that interactive_eager_few_shot_od_training_colab.ipynb works, so I compare tensors which are fed into model.loss. They are exactly same, except of some initial format. For example, shape of working ipynb vs. true_shape_tensor of my problematic code:
working shape: Tensor(“Const:0”, shape=(4, 3), dtype=int32)
my true_shape_tensor: Tensor(“Preprocessor/stack_1:0”, shape=(4, 3), dtype=int32)

The same for preprocessed_inputs;
working code: {‘preprocessed_inputs’: <tf.Tensor ‘concat:0’ shape=(4, 640, 640, 3) dtype=float32>, ‘feature_maps’: [<tf.Tensor ‘ResNet50V1_FPN/FeatureMaps/top_down/smoothing_1/Relu6:0’ shape=(4, 80, 80, 256) dtype=float32>, <tf.Tensor ‘ResNet50V1_FPN/FeatureMaps/top_down/smoothing_2/Relu6:0’ shape=(4, 40, 40, 256) dtype=float32>, <tf.Tensor ‘ResNet50V1_FPN/FeatureMaps/top_down/projection_3/BiasAdd:0’ shape=(4, 20, 20, 256) dtype=float32>, <tf.Tensor ‘ResNet50V1_FPN/bottom_up_block5/Relu6:0’ shape=(4, 10, 10, 256) dtype=float32>, <tf.Tensor ‘ResNet50V1_FPN/bottom_up_block6/Relu6:0’ shape=(4, 5, 5, 256) dtype=float32>], ‘anchors’: <tf.Tensor ‘Concatenate/concat:0’ shape=(51150, 4) dtype=float32>, ‘final_anchors’: <tf.Tensor ‘Tile:0’ shape=(4, 51150, 4) dtype=float32>, ‘box_encodings’: <tf.Tensor ‘concat_1:0’ shape=(4, 51150, 4) dtype=float32>, ‘class_predictions_with_background’: <tf.Tensor ‘concat_2:0’ shape=(4, 51150, 2) dtype=float32>}

my code: {‘preprocessed_inputs’: <tf.Tensor ‘Preprocessor/stack:0’ shape=(4, 640, 640, 3) dtype=float32>, ‘feature_maps’: [<tf.Tensor ‘ResNet50V1_FPN/FeatureMaps/top_down/smoothing_1/Relu6:0’ shape=(4, 80, 80, 256) dtype=float32>, <tf.Tensor ‘ResNet50V1_FPN/FeatureMaps/top_down/smoothing_2/Relu6:0’ shape=(4, 40, 40, 256) dtype=float32>, <tf.Tensor ‘ResNet50V1_FPN/FeatureMaps/top_down/projection_3/BiasAdd:0’ shape=(4, 20, 20, 256) dtype=float32>, <tf.Tensor ‘ResNet50V1_FPN/bottom_up_block5/Relu6:0’ shape=(4, 10, 10, 256) dtype=float32>, <tf.Tensor ‘ResNet50V1_FPN/bottom_up_block6/Relu6:0’ shape=(4, 5, 5, 256) dtype=float32>], ‘anchors’: <tf.Tensor ‘Concatenate/concat:0’ shape=(51150, 4) dtype=float32>, ‘final_anchors’: <tf.Tensor ‘Tile:0’ shape=(4, 51150, 4) dtype=float32>, ‘box_encodings’: <tf.Tensor ‘concat_1:0’ shape=(4, 51150, 4) dtype=float32>, ‘class_predictions_with_background’: <tf.Tensor ‘concat_2:0’ shape=(4, 51150, 2) dtype=float32>}

The only difference is “Const:0” vs. “Preprocessor/stack_1:0” for shapes, and ‘concat:0’ vs. ‘Preprocessor/stack:0’ for preprocessed_inputs. Can anyone comment about potential issue for Preprocessor/stack:0? What to do about it?
Thanks!

Dennis_Sinitsky · February 18, 2024, 2:31am

Or, a more basic question, when I run my assignment script, I get the error below. Any suggestion what can go wrong? Thank you

Start fine-tuning!

ValueError Traceback (most recent call last)
in <cell line: 3>()
18
19 # Training step (forward pass + backwards pass)
—> 20 total_loss = train_step_fn(image_tensors,
21 gt_boxes_list,
22 gt_classes_list,

4 frames
/usr/local/lib/python3.10/dist-packages/tensorflow/python/util/traceback_utils.py in error_handler(*args, **kwargs)
151 except Exception as e:
152 filtered_tb = _process_traceback_frames(e.traceback)
→ 153 raise e.with_traceback(filtered_tb) from None
154 finally:
155 del filtered_tb

/tmp/autograph_generated_filezcypnxsi.py in tf__train_step_fn(image_list, groundtruth_boxes_list, groundtruth_classes_list, model, optimizer, vars_to_fine_tune)
35 ag.ld(print)(‘\n printing prediction_dict:’)
36 ag__.ld(print)(ag__.ld(prediction_dict))
—> 37 losses_dict = ag__.converted_call(ag__.ld(model).loss, (ag__.ld(prediction_dict), ag__.ld(true_shape_tensor)), None, fscope)
38 total_loss = ag__.ld(losses_dict)[‘Loss/localization_loss’] + ag__.ld(losses_dict)[‘Loss/classification_loss’]
39 gradients = ag__.converted_call(ag__.ld(tape).gradient, (ag__.ld(total_loss), ag__.ld(vars_to_fine_tune)), None, fscope)

/usr/local/lib/python3.10/dist-packages/object_detection/meta_architectures/ssd_meta_arch.py in tf__loss(self, prediction_dict, true_image_shapes, scope)
137 pass
138 ag__.if_stmt(ag__.converted_call(ag__.ld(self).groundtruth_has_field, (ag__.ld(fields).InputDataFields.is_annotated,), None, fscope), if_body_5, else_body_5, get_state_5, set_state_5, (‘losses_mask’,), 1)
→ 139 location_losses = ag__.converted_call(ag__.ld(self).localization_loss, (ag_.ld(prediction_dict)[‘box_encodings’], ag__.ld(batch_reg_targets)), dict(ignore_nan_targets=True, weights=ag__.ld(batch_reg_weights), losses_mask=ag__.ld(losses_mask)), fscope)
140 cls_losses = ag__.converted_call(ag__.ld(self).classification_loss, (ag_.ld(prediction_dict)[‘class_predictions_with_background’], ag__.ld(batch_cls_targets)), dict(weights=ag__.ld(batch_cls_weights), losses_mask=ag__.ld(losses_mask)), fscope)
141

/usr/local/lib/python3.10/dist-packages/object_detection/core/losses.py in tf____call__(self, prediction_tensor, target_tensor, ignore_nan_targets, losses_mask, scope, **params)
47 nonlocal target_tensor
48 pass
—> 49 ag__.if_stmt(ag__.ld(ignore_nan_targets), if_body, else_body, get_state, set_state, (‘target_tensor’,), 1)
50
51 def get_state_2():

/usr/local/lib/python3.10/dist-packages/object_detection/core/losses.py in if_body()
42 def if_body():
43 nonlocal target_tensor
—> 44 target_tensor = ag__.converted_call(ag__.ld(tf).where, (ag__.converted_call(ag__.ld(tf).is_nan, (ag__.ld(target_tensor),), None, fscope), ag__.ld(prediction_tensor), ag__.ld(target_tensor)), None, fscope)
45
46 def else_body():

ValueError: in user code:

File "<ipython-input-443-22da03bdfee6>", line 46, in train_step_fn  *
    losses_dict = model.loss(prediction_dict, true_shape_tensor)
File "/usr/local/lib/python3.10/dist-packages/object_detection/meta_architectures/ssd_meta_arch.py", line 876, in loss  *
    location_losses = self._localization_loss(
File "/usr/local/lib/python3.10/dist-packages/object_detection/core/losses.py", line 78, in __call__  *
    target_tensor = tf.where(tf.is_nan(target_tensor),

ValueError: Shapes must be equal rank, but are 3 and 1 for '{{node Loss/Loss/Select}} = Select[T=DT_FLOAT](Loss/Loss/IsNan, concat_1, Loss/stack_2)' with input shapes: [0], [4,51150,4], [0].

Deepti_Prasad · February 18, 2024, 6:30pm

Hello @Dennis_Sinitsky

Based on the below error

Please check if your define the path (string) for each image correctly recalled.

Next check code in section for running a dummy image, make sure you have used conversion of image into tensor with the correct shape, the shape should not be [4, 640, 640, 3] as the instructions clearly indicates pass as batch of 1.
Next in the gradient tape loss,
make sure you have true tensor shape is correct shape and the preprocess image is not hard coded, you only need to use tf.concat to the preprocess image list. For true tensor shape tf.constant.

Let me know in case your issue is resolved or any further help is required.

Regards
DP

Dennis_Sinitsky · February 19, 2024, 8:20am

Hi Deepti,
thanks for your message. I also think that something is not preprocessed right; I always have difficulty with tensor dimensions and keeping track of them. Let me poke a couple of days more at this problem before seeking more help.
Thank you
Dennis

Deepti_Prasad · February 19, 2024, 9:09am

sure @Dennis_Sinitsky send your notebook if you are not able to get solution, I have pointed out all the points were you need to check for correction. also there are threads related to this assignment which might help you, so you use the search tool

Dennis_Sinitsky · February 20, 2024, 3:27am

I finally got it.

Deepti_Prasad · February 20, 2024, 4:56am

I can understand that expression I had similar experience when I was doing the course

Topic		Replies	Views
Assignment C3W2 Zombie Detector Advanced Computer Vision with TensorFlow week-module-2	1	397	September 27, 2023
Fine-Tuning doesn't work Advanced Computer Vision with TensorFlow week-module-2	3	607	October 22, 2022
[C3W2_Assignment] Zombie detector Advanced Computer Vision with TensorFlow week-module-2	5	441	September 28, 2023
Loss error in exercise 10 Advanced Computer Vision with TensorFlow week-module-2	11	392	November 20, 2023
C3W2 Exercise 10 issues Advanced Computer Vision with TensorFlow week-module-2	21	794	August 4, 2024

Problem in my train_step_fn; W2_C3

Related topics