I have written the code to generate the model variables running a prediction on a dummy image.
I won’t put the code here so I don’t spoil others’ learning.
If I run the code cell once, I get the following:
ValueError: Received incompatible tensor with shape (1, 1, 2048, 512) when attempting to restore variable with shape (3, 3, 256, 256) and name conv4_block5_2_conv/kernel:0.
But it succeeds if I re-run precisely the same cell without any change.
Am I doing anything wrong that could prevent the model from working nicely?
When I run the training loop, I don’t see the loss decreasing, and I am wondering if it’s related to the error mentioned above or that’s a separate problem:
Start fine-tuning!
batch 0 of 100, loss=48.75761
batch 10 of 100, loss=48.00668
batch 20 of 100, loss=46.562923
batch 30 of 100, loss=44.8776
batch 40 of 100, loss=43.10805
batch 50 of 100, loss=41.30913
batch 60 of 100, loss=39.499985
batch 70 of 100, loss=37.687267
batch 80 of 100, loss=35.873306
batch 90 of 100, loss=34.058926
Done fine-tuning!
The for the dummy image has the expected shape t.zeros(1, 640, 640, 3).
I’ve just also noticed that the preloaded weights don’t have the expected shape:
# Test Code:
assert len(detection_model.trainable_variables) > 0, "Please pass in a dummy image to create the trainable variables."
print(detection_model.weights[0].shape)
print(detection_model.weights[231].shape)
print(detection_model.weights[462].shape)
Where according to the colab text the expected output is:
(3, 3, 256, 24)
(512,)
(256,)
I am getting
(3, 3, 256, 24)
(256,)
(256,)
So probably I’m not doing correctly the selection of layers to be trained.
This can also be because of different image dimension of your dummy image compare to the image used in the assignment. One would need to look at your codes once to give better review. You can share the codes if it is not part of any graded assignment. But kindly mention with header Not Coursera graded assignment, so posting codes.
Also your training accuracy depends on batch size compare to your whole dataset and training dataset, what loss parameters you have used and other model algorithm parameters. Because your loss for the first size seems to have started at lower point, points out more on dataset discrepancy.
I think you need to look at the tensor shape of your dummy image
as I see your error mentions the tensor shape is incompatible. So the dummy image you have used need to be converted to the shape of (3, 3, 256, 256)
That means you can reframe or shape the dummy image as per the required restored variable. Also if you see conv4 block kernel is zero. Check if your kernels match with the assignment algorithm.
I’ve found the issue!
I was putting an incorrect path for the pipeline configuration.
So, I was instantiating the correct model with the wrong configuration. Hence, there was an error when I provided the dummy image and the incorrect shape of the given layers.
Once I put the correct pipeline configuration, the model behaved as expected.
Thanks a lot for the clues and the patience in directing me toward the solution on a Sunday. You are too kind.