Week 3 Assignment 2: Padding Confusion

Hello, I’m having some difficulty understanding the detailed statement of the U-net’s encoder stage:

“The contracting path follows a regular CNN architecture, with convolutional layers, their activations, and pooling layers to downsample the image and extract its features. In detail, it consists of the repeated application of two 3 x 3 unpadded convolutions, each followed by a rectified linear unit (ReLU) and a 2 x 2 max pooling operation with stride 2 for downsampling. At each downsampling step, the number of feature channels is doubled.”

Conceptually, I believed that the pooling layers alone were causing the downsampling of the volumes, but in the above text it refers to unpadded 3x3 conv layers which would also lead to a downsizing of dimensions. On top of that, the exercise moves forward with using “same” padding which seems to contradict the previous statement.

Is this a possible typo or am I misunderstanding something crucial?

Thanks,
Dylan

Hi,
It does look contradictory. By the way, what is the the value of ‘stride’ in the 3x3 convolutions. If the stride>1, it will cause down-sampling there too.