In the Training section of the assignment I find it a bit strange that initial_shape (of the input image) is different than the target_shape (of the output image). I would expect then that at least the labels already had the shape target_shape but instead they are cropped to that target_shape removing a non-negligible part of the original label image.
Any idea why?
(I have assumed here that the image size in the “volumes” inputs i.e. input_shape (512) is the same as the image size of the original “labels” before these are cropped to target_shape (373))
Hi @victor_popa,
It’s been too long since I looked at this assignment, so I would need to take a little while to catch myself up on this, but in the meantime, I noticed this in the assignment - does it help explain what you’re asking about?
Expanding Path
Next, you will implement the expanding blocks for the expanding path. This is the decoding section of U-Net which has several upsampling steps as part of it. In order to do this, you’ll also need to write a crop function. This is so you can crop the image from the contracting path and concatenate it to the current image on the expanding path—this is to form a skip connection. Again, the details are from the paper (Renneberger, 2015):
Every step in the expanding path consists of an upsampling of the feature map followed by a 2 x 2 convolution (“up-convolution”) that halves the number of feature channels, a concatenation with the correspondingly cropped feature map from the contracting path, and two 3 x 3 convolutions, each followed by a ReLU. The cropping is necessary due to the loss of border pixels in every convolution.
Fun fact: later models based on this architecture often use padding in the convolutions to prevent the size of the image from changing outside of the upsampling / downsampling steps!