In assigment one (week 2) you write:
- Zero-padding pads the input with a pad of (3,3)
- Stage 1:
- The 2D Convolution has 64 filters of shape (7,7) and uses a stride of (2,2).
- BatchNorm is applied to the ‘channels’ axis of the input.
- ReLU activation is applied.
- MaxPooling uses a (3,3) window and a (2,2) stride.
- Stage 2:
- The convolutional block uses three sets of filters of size [64,64,256], “f” is 3, and “s” is 1.
- The 2 identity blocks use three sets of filters of size [64,64,256], and “f” is 3.
- Stage 3:
- The convolutional block uses three sets of filters of size [128,128,512], “f” is 3 and “s” is 2.
- The 3 identity blocks use three sets of filters of size [128,128,512] and “f” is 3.
I passed the coding fine, and I understand the network architecture. What I don’t understand is why in stage 2 and stage 3, the filters are size 64, 64, 256 and then double. Isn’t the size (64, 64) too big for the actual size of the network at that point.
The actual layers size when we look at the model is:
None, 15, 15, 64)
(None, 15, 15, 64)
(None, 15, 15, 64)
(None, 15, 15, 256)
(None, 15, 15, 256)
(None, 15, 15, 256)
(None, 8, 8, 128)
(None, 8, 8, 128)
(None, 8, 8, 512)
(None, 8, 8, 512)
(None, 8, 8, 512)
(None, 4, 4, 256)
(None, 4, 4, 256)
(None, 4, 4, 1024)
(None, 4, 4, 1024)
(None, 4, 4, 1024)
So the last number follows the number of channels that we setup, but the layer size is decreasing. Except we are using 64, then 128, then 256.
Why do we specify the convolutional layer to have larger and larger size (64,64), then 128,128
Thanks in advance,