MaxPooling2D - layer size and related formulae

Hi, Anthony.

Now that I think \epsilon more, we can “back calculate” the image dimensions by using the same technique as for computing output dimensions on transpose convolutions.

At the FC layer, we can see that there are 32 input channels, so the images at that point are 8 * 8. So before the MaxPool layer, they were 16 * 16.

If we apply the transpose convolution formula (see this thread, which links to this one):

n_{out} = (n_{in} - 1) * s + f - 2p

With n_{in} = 16, f = 3, and s and p = 1, we get:

n_{out} = (16 - 1) * 1 + 3 - 2*1 = 16

Then it’s the same in the first conv layer, so it looks like the original inputs must have been 16 X 16. Does that make sense based on the context?

We can check our work by using the “forward” convolution formula:

n_{out} = \displaystyle \lfloor \frac {n_{in} + 2p - f}{s} \rfloor + 1

Which gives:

n_{out} = \displaystyle \lfloor \frac {16 + 2 * 1 - 3}{1} \rfloor + 1 = 16