I just wonder if there is a common rule of thumb for target output shape before flatten layer in the models. For instance most of examples reach 17,17 shape after last max pooling layer. How deep can we filter our image data, and what is the disadvantage to go to smaller output shapes?
When you increase the layers of the max pooling, it leads to focusing on the edges or high colors, and this thus leads to neglecting or avoiding some colors that can carry accurate information … Therefore, the max pooling is used only when the image is large in dimensions, we reduce the size of the image and thus speed up the process of neural network, we use max pooling until we reach the dimensions of the image, in which the image has few dimensions, and if the image does not lose its properties
I hope this answer help you,
please feel free to ask any question,