Around 2:06, Andrew talks about Max-Pooling in the example. Shouldn’t the number of channels in the pooled output be same as the number of channels in Input? As pooling doesn’t change the number of channels. Please correct me if I am wrong.
Thank You
I think you are right. I guess Andrew skipped to explain actual implementation of Inception.
Here is a figure from an original paper.
In naive version, it is a simple parallel paths to have 1x1 conv, 3x3 conv, 5x5 conv and 3x3 max pooling. But, real Inception module has a dimension reduction like the right-hand figure.
In that case, 1x1 convolution (which has most likely 32 filters) follows 3x3 max pooing, which results in 28x28x32 outputs.
I think inception (3a) is what Andrew picked up. That’s actually an implementation of the right hand side.
- The first path is simple 1x1 convolution with 64 filters.
- The 2nd path is a) 1x1 convolution with 96 filters, then, 3x3 convolution with 128 filters.
- The 3rd path is a) 1x1 convolution with 16 filters, then, 5x5 convolution with 32 filters.
- The 4th path is a) 3x3 max pooling, then 1x1 convolution with 32 filters.
In net, total 64 + 128 + 32 + 32 = 256 channels (28x28x256 output size)
So, you are right. I think Andrew skipped the above point.
3 Likes