Exercise 10: Define the training step


Got stuck in this exercise for a while now, looked it up and could not locate the cell that’s causing the error, Could I please get some help?

Check the gradient tape implementation function, also check if you are selecting the right layers from the resenet network, it is probably best to go again and redo/check steps from 4 to 10.

Has there been any solution? I am stuck with the same issue and re-checked my code three times now.

All my outputs from 4 to 10 have been identical to the ones expected and I can’t find a any mistake with my train_step_fn.

I preprocess the image by
preprocessed_image_tensor using concat and preprocessed_image_list
true_shape_tensor using concat and true_shape_list

Then make the prediction with model.predict and the two defined variables above

The total_loss uses the losses_dict from “Calculate Loss” and the gradients calculated using tape.gradient with total_loss and vars_to_fine_tune. Updating it by aoptimizer.apply_gradients with zip.

What errors are you getting?

Start fine-tuning!

ValueError Traceback (most recent call last)
in ()
21 detection_model,
22 optimizer,
—> 23 to_fine_tune
24 )
25

1 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in autograph_handler(*args, **kwargs)
1127 except Exception as e: # pylint:disable=broad-except
1128 if hasattr(e, “ag_error_metadata”):
→ 1129 raise e.ag_error_metadata.to_exception(e)
1130 else:
1131 raise

ValueError: in user code:

File "<ipython-input-51-dd13c5ea4070>", line 59, in train_step_fn  *
    optimizer.apply_gradients(zip(gradients, vars_to_fine_tune))
File "/usr/local/lib/python3.7/dist-packages/keras/optimizer_v2/optimizer_v2.py", line 633, in apply_gradients  **
    grads_and_vars = optimizer_utils.filter_empty_gradients(grads_and_vars)
File "/usr/local/lib/python3.7/dist-packages/keras/optimizer_v2/utils.py", line 73, in filter_empty_gradients
    raise ValueError(f"No gradients provided for any variable: {variable}. "

ValueError: No gradients provided for any variable: (['WeightSharedConvolutionalBoxPredictor/WeightSharedConvolutionalBoxHead/BoxPredictor/kernel:0', 'WeightSharedConvolutionalBoxPredictor/WeightSharedConvolutionalBoxHead/BoxPredictor/bias:0', 'WeightSharedConvolutionalBoxPredictor/WeightSharedConvolutionalClassHead/ClassPredictor/kernel:0', 'WeightSharedConvolutionalBoxPredictor/WeightSharedConvolutionalClassHead/ClassPredictor/bias:0', 'WeightSharedConvolutionalBoxPredictor/BoxPredictionTower/conv2d_0/kernel:0', 'WeightSharedConvolutionalBoxPredictor/BoxPredictionTower/conv2d_0/BatchNorm/feature_0/gamma:0', 'WeightSharedConvolutionalBoxPredictor/BoxPredictionTower/conv2d_0/BatchNorm/feature_0/beta:0', 'WeightSharedConvolutionalBoxPredictor/BoxPredictionTower/conv2d_1/kernel:0', 'WeightSharedConvolutionalBoxPredictor/BoxPredictionTower/conv2d_1/BatchNorm/feature_0/gamma:0', 'WeightSharedConvolutionalBoxPredictor/BoxPredictionTower/conv2d_1/BatchNorm/feature_0/beta:0', 'WeightSharedConvolutionalBoxPredictor/BoxPredictionTower/conv2d_2/kernel:0', 'WeightSharedConvolutionalBoxPredictor/BoxPredictionTower/conv2d_2/BatchNorm/feature_0/gamma:0', 'WeightSharedConvolutionalBoxPredictor/BoxPredictionTower/conv2d_2/BatchNorm/feature_0/beta:0', 'WeightSharedConvolutionalBoxPredictor/BoxPredictionTower/conv2d_3/kernel:0', 'WeightSharedConvolutionalBoxPredictor/BoxPredictionTower/conv2d_3/BatchNorm/feature_0/gamma:0', 'WeightSharedConvolutionalBoxPredictor/BoxPredictionTower/conv2...

“No gradients provided for any variable”, it means that the calculation of the gradients is not right, so you have to trace back dependencies on these. I would pay special attention to:

  1. losses_dict
  2. to_fine_tune , list, which should be 4 layers;
WeightSharedConvolutionalBoxPredictor/WeightSharedConvolutionalBoxHead/BoxPredictor/kernel:0
WeightSharedConvolutionalBoxPredictor/WeightSharedConvolutionalClassHead/ClassPredictor/kernel:0

and their biases, no more than than 4 layers, but make sure you choose the right ones!

  1. Weights restorations is important here.

  2. Restore the checkpoint, can you restore the checkpoint, can you find the files in the directory?

  3. Also very important is to download the right checkpoints.

Thank you very much for your fast reply!

  1. losses_dict
    I do not see any errors here, printing the keys and losses in train_step_fn shows:
loss dictionary keys: dict_keys(['Loss/localization_loss', 'Loss/classification_loss'])
localization loss 0.08500113
classification loss 1.23899662
total_loss: Tensor("add:0", shape=(), dtype=float32)

Just that my total_loss is a tensor seems a bit odd to me, as it is die sum of two values.

  1. to_fine_tune
    Should be correct printing vars_to_fine_tune.name in train_step_fn outputs the kernels and biases
WeightSharedConvolutionalBoxPredictor/WeightSharedConvolutionalBoxHead/BoxPredictor/kernel:0
WeightSharedConvolutionalBoxPredictor/WeightSharedConvolutionalBoxHead/BoxPredictor/bias:0
WeightSharedConvolutionalBoxPredictor/WeightSharedConvolutionalClassHead/ClassPredictor/kernel:0
WeightSharedConvolutionalBoxPredictor/WeightSharedConvolutionalClassHead/ClassPredictor/bias:0
  1. Weights restoration
    What do you mean by weights restoration?

  2. Restore checkpoint
    Checkpoint has been restored using checkpoint.restore as shown in the Video.

  3. Download the right checkpoints
    I’ve downloaded http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz
    and copied it to /content/models/research/object_detection/test_data/checkpoint.
    The files are there and the path is specified to the correct path.