Shouldn’t we take the last output of the last conv block during encoding? Why the first?
As a belated reply: The first output of the previous block in downsampling is next_layer, which includes a max pooling operation for downsampling. The second output of the previous block is not downsampled and is passed to the expanding path. So for the downsampling path the first output, i.e. next_layer, should be used.