Hello everyone.
I have a question about MaxPooling2D layer.
in the deep learning specialization Andrew said that if we have 512 × 512 pixel image and 3 × 3 filter, the dimensions of output image will be 512-3 + 1 × 512-3 + 1, but in the examples here, the dimensions of the output image are equal to half the dimensions of the input image, but why ?
Hello @mahyaalizadeh,
MaxPooling2D downsamples the dimensions of its inputs for each channel. In this case, considering that it is:
tf.keras.layers.MaxPooling2D(2,2)
the dimensions in the input get halved. Basically, you take the biggest value for every square 2x2 of the image in the input.
ps: sorry, but I don’t remember in which part of his course Andrew said that.
Best,
Maurizio
Hi @mahyaalizadeh,
As Maurizio says this layer downsamples the input along its spatial dimensions by taking the maximum value over an input window. If you remember, Andrew mentions input shape, pool size, stride and padding when discussing this topic.
Let’s see our model. This is what you get when summarising it.
According to the API documentation for MaxPooling2D:
output_shape = math.floor((input_shape - pool_size) / strides) + 1
For argument shake, lets take the first max pooling:
output_shape = math.floor((148 - 2) / 2) + 1 = 74
Notice that according to the documentation strides = pool_size by default, that’s why it’s halved.
For Andrew’s case, stride is 1:
output_shape = math.floor((512 - 3) / 1) + 1 = 510
Hope it helps