In the Fig 4 of assignment Image Segmentation with U-Net (W3A2) the input to the first layer of decoder is named “Bottleneck output”. On the other hand, in the description of Exercise 3 - unet_model, for the second half of Unet it says:
Use cblock5 as expansive_input and cblock4 as contractive_input, with
n_filters * 8. This is your bottleneck layer.
which means we should consider “ublock6” in the unet_model as the bottleneck layer.
A bit confusing, which one is the bottleneck layer here? and basically why is that called bottleneck because there is no compression and then expansion of dimension here?
It’s an interesting point, but it’s purely a question of nomenclature. It doesn’t really have any bearing on what is actually happening or how you write the code, right?
You’re right that it would probably make the most sense to call the skinniest section “the bottleneck”. But is that cblock5 or ublock6? The first upsampling block ublock6 takes the “skinniest” input and starts the “reinflation” process. Whereas cblock5 takes a larger input and shrinks it to the “skinniest” output. So what does it make the most sense to call “the bottleneck”? cblock5 or ublock6? I’d say it’s really the pair of cblock5 + ublock6 that you need to really get the “U” shape at its lowest point.
Maybe that’s the real point here: if you look at the picture of a U-Net early in the notebook, what is that skinniest section at the bottom? It’s the output of cblock5 and the input of ublock6. That’s “the bottleneck”, but it’s not really a “layer” per se: it’s the output of one layer and the input to the next.