Hello community,
I would like to know why in this coures the shape of input image rgb is height * width * channels knowing that a rgb image is three matrices red, green, and blue therefore the logical representation is channels * height * width
Hi @Med-akraou,
Both channels-first and channels-last are valid representations of a multichannel image.
This article explained in pyTorch why channels-last is preferred for the sake of better performance.
Unfortunately I don’t seem to find a similar discussion for Tensorflow at this moment, so you might need to do your search if you would like to check out some benchmarks.
This doesn’t really explain why the course chose channels-last, but this should provide a direction to consider it: that the choice should be more performance-dependence.
Cheers,
Raymond
PS: Tensorflow supports both in many CNN operations, you might indeed test them out and see which runs faster?
You’re right that there are two ways to format images: “channels first” and “channels last”. In this course, they have chosen the “channels last” orientation, possibly because that is the default orientation used by TensorFlow. And when you have a batch of inputs, the first dimension is the “samples” dimension. So we have 4D tensors with dimensions
samples x height x width x channels
The orientation is a choice, but the choice has been made for us by the course staff in this case. When you are working on your own, you can make a different choice.
Therefore, we should transform images from channelsheightwidth to heightwidthchannels ?
Test which works better in your case, and justify your decision by test results.
Cheers,
Raymond