In the video at around 3:30 the skip connection is depicted as skipping the input and output “blocks”. Is this correct!? I thought the residual connection should skip the “bottleneck” block.
I think you are just misinterpreting the graphical presentation there. I suggest you watch it again with the following idea in mind: the “bottleneck block” is not a single layer. It is a sequence of layers. The diagram they show has the “skip” layer diverging right before the “expansion” step that marks the start of the bottleneck block and it rejoins right after the “shrinking” step that is the last step of the bottleneck block. So it is precisely skipping around the bottleneck block. That is exactly what they are depicting there.
The other point is that this is just one such instance. The full MobilNet consists of multiple of those concatenated and some other layers as well.
That’s exactly how it is shown in the first part of the video (prior to 3:30) which makes perfect sense. However, it appears that starting at 3:30 the skip connection picks up before the input block and plugs back in right after the output.
My point is still that you are misinterpreting what you are seeing. What is the “input” block in your definition? I claim that is what I am calling the “expansion” block that is the first step of the bottleneck layer. The skip connection takes the same input that the expansion block is getting and passes it around and rejoins after the “shrink” step at the end of the bottleneck block.
Or maybe the simpler way to state my point is that it looks like the problem is that we are disagreeing about the definition of the “bottleneck block”. My claim is that it is exactly everything that is being skipped by the residual connection in that diagram. That’s the point.
Thanks, @paulinpaloalto! I understand what you are saying. Perhaps, the diagram in the video loosely shows the arrow originating prior to the n x n x 3 input “block” and rejoining right after the n x n x 3 output block. The correct depiction of the skip connection in this case would be if the arrow originated right after the input block (but before the expansion block) and rejoined right after the projection, but before the output block. Right?
Sorry, I still think the diagram is correct as written, but it’s been a while since I listened to this lecture. I will need to watch it again and perhaps actually look at the MobilNet code in this weeks assignments to make sure I understand what is going on here.
Ok, I think this boils down to how to interpret what Prof Ng means by that diagram. I looked at model.summary()
of the MobilNet model in the Transfer Learning assignment. He’s not really showing a normal ConvNet diagram here as he did in C4 W1 where each block shows the output of that block. The entire diagram shows one complete “Bottleneck” section. The input to that whole section is n x n x 3 and that same input is also the input to the “skip” connection. Then in the bottleneck layer, you do the following operations:
-
The expansion, which is a 1 x 1 convolution with 18 output filters. So each filter is 1 x 1 x 3. That produces the second box on the diagram.
-
To get from the second to the third box in that diagram, you do a “depthwise” convolution. The input and output of that operation are both n x n x 18.
-
Then you do another 1 x 1 convolution to reduce the channels back to 3. So there are 3 filters there, each of dimension 1 x 1 x 18. That produces the output which is n x n x 3.
-
The “skip input” shaped n x n x 3 is added to the output of step 3) to form the output of the entire operation.
So that is what the above diagram actually means. With that interpretation, I claim that my way of reading it matches the way the picture is drawn. You are simply interpreting the meaning of the boxes in the drawing differently, but we are both saying the same thing in terms of the actual operations that happen.
Ok, thanks @paulinpaloalto!