Tensorflow - reshape error

I trained a CNN model in colab pro in the month of February. My model involved reshaping each input image. The model training was successful back then. However, now when I’m training the same model without any changes (the dataset is same as well), I get an error
InvalidArgumentError: Input to reshape is a tensor with 65536 values, but the requested shape has 3
Input images are of size (256, 256, 3). I don’t understand why or how this error came up. I’ve manually checked that each image in my dataset is fine, and there is no single change in the model architecture or input pipeline.

The exception trace must point back to some particular source line, right? If nothing changed, then why are the results different? Maybe it’s a new version of TF that is not backwards compatible. Sounds like time for some debugging.

Thanks @paulinpaloalto . I did some diagnosis and found out the following things:
I’ll first briefly describe my model architecture so that it’s easy to follow. I create 2 tf.data.Dataset objects. In the first one, I pass each input image (of dimension 288 x 288 x 3) through a function where template matching is done. A template (of dimension 256 x 256 x 3) is used to pick out the region of interest from each input image. The resultant images are of dimension 256 x 256 x 3. In the second dataset, I do the same thing as above with an additional function that adds random brightness value to each image. Finally, I join these 2 datasets using concatenate function of tf.data.Dataset (check this link for concatenate function tf.data.Dataset | TensorFlow v2.12.0)

  1. When I examined the exception trace I found out that the reshape error (the one that I mentioned in this post) is somewhere in the template matching function. To check if that function is correctly defined, I passed each image through that function and saved the output in a location. This ran perfectly fine without any issues. So I became perplexed.
  2. Next, I tried something different. I created only one dataset object for training in which each input image (288, 288, 3) is passed through the template matching function (template 256, 256, 3). I did not used the 2nd dataset object in which template matching is carried out followed by random brightness. To my surprise, there was no error this time. Now this gives me a real headache because there is no error this time in the template matching function (there are no changes in the function).

Thanks for the detailed explanation of how your model is set up. It sounds like the bug is in the combination of the template matching logic with the added brightness. Maybe you could add some additional logic in the “add brightness” function or in the template matching function to check for anomalies, so that you get the logical equivalent of a “breakpoint”. The datasets are too big to be able to just print something as every image is handled, so you need a way to narrow in on what is causing the exception. If you can’t see enough in the aftermath of the exception to figure out what is going on, then you need a way to “trigger” earlier.

But looking back at the original exception, 65536 doesn’t make sense as the size of one of the inputs, if the images are (256, 256, 3). We have 256^2 = 65536, right? Also what does your code look like for specifying the target shape on the reshape? Is it based on aspects of the input? What would have to be wrong with the input for the resultant value to be 3?

The “meta” point here being that debugging always has to start by reasoning about the evidence you can see. If that’s not enough, then you need instrumentation to peel the next layer of the onion to see more deeply.

Thanks for the suggestions @paulinpaloalto !
It turns out that ultimately the error has something to do with the concatenate operation that I’m using to bunch together the two dataset objects. I’ll update this thread once I figure out exactly what is the source of the error.
Thanks for your time.