Is this a small error?

I don’t want to appear a pedant, but is this an error in the U-Net assignment or am I confused?

The contracting path follows a regular CNN architecture, with convolutional layers, their activations, and pooling layers to downsample the image and extract its features. In detail, it consists of the repeated application of two 3 x 3 unpadded convolutions, each followed by a rectified linear unit (ReLU) and a 2 x 2 max pooling operation with stride 2 for downsampling. At each downsampling step, the number of feature channels is doubled.

I’m referring specifically to where it says the convolutions are unpadded. The diagram seems to indicate that the result of each of the convolutions has the same height prior to downsampling which would imply “same” padding, but unpadded would imply “valid” padding, right?

Interesting. I think you’re right that the verbiage is inconsistent with both the pictures and the code. You can see that within each “level” of the downsampling path, the overall size stays the same which would imply “same” padding. And the code instructions also have us do “same” padding. I agree that “unpadded” would imply no padding (aka “valid” padding). I’ll file a bug about this.

Thanks for pointing that out!

@Tim_Bate @paulinpaloalto

I ran into the same confusion trying to understand the maths behind the convolution + ReLU layers. Using the first layer as an example, this is how I understand it:

  • we have an input image of dimensions (128, 128, 3)
  • we have x filters of dimensions (3, 3, 3), where x = 64 (the depth of the output image)
  • we obtain an image of dimensions (128, 128, 64)

The input and output images have the same height and width, meaning that a same convolution must have been used. Using the floor(((n + 2p - f) / s) + 1) formula from the lecture videos, you can then figure out that both padding p and stride s are set to 1.

What I do not understand, though, is why the convolution is repeated in the convolution layer. Wouldn’t it be enough to do it just once (after all, you will end up with the (128, 128, 64) image already then)? Or is it simply that by adding this second convolution, the number of learnable parameters increases, making the U-Net bigger and supposably better?

An additional remark: I really appreciate the time and effort that go into making the notebooks, but unfortunately they are not free of errors and (terminological) inconsistencies that can really get you stuck – until you realise that the information given might actually be incorrect… It has happened to me several times. I guess that’s part of the deal though, and I am really happy that some learners (like @Tim_Bate) don’t take these errors for granted but address them on this board, so that we all can learn from them, and that the Coursera mentors (kudos to @paulinpaloalto in this case) reply so fast. Once again, looking at the board has gotten me unstuck!

1 Like

Well, just because the shape ends up the same does not mean that two conv layers back to back has the same effect as one, right? It is exactly what you say: it’s a way more complex function with two layers. With everything in terms of network architectures here, it is what has been determined to work. You try it with one, you try it with two, you try it with three and so forth. And then you strike a balance between compute/storage cost of the training and the quality of the results.

That said it is also the case that the search space is pretty intractable here, meaning “too many choices” vs finite time, so who knows how thoroughly they explored the possibilities. In any specific case, e.g. U-Net here, you could try reading the original papers and perhaps they give some insight into how they came up with the final architecture that they settled on.

In regards to the course materials, there is a lot here and the quality of the information is very high in general, but nothing is ever perfect. The course staff is very busy and have other responsibilities (e.g. producing rich new courses like the whole MLS specialization), but they will fix bugs that get reported. Of course the timing of the fixes depends on the usual triage vs resource constraints. Please report anything you see, as Tim has done here, and the mentors can file the bugs.

Of course it’s also valuable to post your findings here on Discourse in any case: even if the bugs may sometimes take a while to get fixed, having the information published here may help others who come along later. To be fair, they are usually pretty responsive with anything serious.

Thanks again, Paul.

Please report anything you see, as Tim has done here, and the mentors can file the bugs.

Will do!

Of course it’s also valuable to post your findings here on Discourse in any case: even if the bugs may sometimes take a while to get fixed, having the information published here may help others who come along later.

Yes, of course. I have benefitted from that myself already a few times :slight_smile:

1 Like