As far as I understand from the course, we only applied max pooling in each channel (so take max from 2 dimensional window). So, the number of channels doesn’t change.
In the video about Inception Network, we apply the max pool in the end, obtaining rather many channels (192), and then shrink the number of channels using 1x1 conv.
My question is, would it make any sense to do max pooling not just inside one channel but going in depth too (similarly to convolution)? This could, for example, decrease the number of channels directly in the inception case.
Or taking max across different channels would confuse the features from different channels?
I think this is right, information per channel would be mixed in some way but if you decode it in then end of the network in some way maybe it can be helpful
1 Like
Hi @romcs
Yes, your intuition is correct. Extending max pooling across channels can reduce the number of channels directly. This approach is called “depth-wise” max pooling.
However, Max pooling across channels could blur or mix distinct features from different channels, that reduces the discriminative power of the features. Additionally, it could disrupt the hierarchical representation learned by convolutional layers in some types of data.
Hope this helps!
1 Like