In the conv_forward implementation, the padding function is applied before iterating over the training examples. So that means on the original input X will be padded, not any of the intermediate activations.
If that’s the case, does SAME padding serve one of its purposes? i.e. to keep input and output dimensions the same.
Padding specifies any changes required to the input before performing the conv operation.
same padding will pad the input such that input.shape == output.shape only when
strides=1. You can read about it here
When it comes to a conv layer, any activation is performed after the convolution operation. Post conv activation won’t change the output shape.
I understand the first part: the input.shape=output.shape (the first layer).
input -> Layer1 -> Layer2
What same padding does is to make sure input.shape==Layer1.shape, but it can’t ensure Layer1.shape=Layer2.shape, because the padding only applies to input, not Layer1. Is that right?
padding parameter is specific to a layer. Unless you specify
padding="same" for layer2, layer1.output.shape != layer2.output.shape.
So padding is only used for input layer, not later layers. Right? But the general form of the input layer and conv layers are the same. So I still don’t quite understand why only padding the input layer can achieve the purpose of making the height and width the same in each layer.
Please read what Balaji said again: he specifically said that the padding action is applied “per layer”. At each layer you have to specify whether you want “same” or “valid” padding. In some layers you may want “same” padding, but probably not in every layer.
Note that in a ConvNet that is producing some kind of classification, the final output by definition has a different shape than the input, right? Even if you start with 256 x 256 x 3 RGB images, if your goal is to identify whether the image contains a cat, dog, kangaroo or elephant, then the output will be a 4 x 1 softmax vector, right? So at some point in the process (and probably more than one point) the output shape will have smaller height and width dimensions than the input. That is the typical pattern: the height and width reduce and the channel dimension increases as you go through the various layers.
In our programming assignment conv_forward, the padding is only applied once for the inputs, but in other networks, same padding can be selectively applied to some layers. In VGG-16, all conv layers are using same padding to retain the same shape, and in AlexNet, some layers use 3x3 filter with same padding. So same padding can be applied to any layer per the author’s choice.
@paulinpaloalto Thanks for the clarification.
Remember that the point of
conv_forward is that is just the operation at one conv layer, right? So, yes, you have a bunch of parameters that are set for each layer individually, including padding type, size of the filters, number of filters etc …
In other words the code we wrote is just the equivalent of one invocation of the
Conv2D layer of Keras. Any ConvNet has more than one layer, right?
Yes， that’s true! The assignment is only for one layer only.