Hi all,
In Week 2 Assignment 4 (of course 4), it says:
“Each block consists of an inverted residual structure with a bottleneck at each end. These bottlenecks encode the intermediate inputs and outputs in a low dimensional space, and prevent non-linearities from destroying important information.”
I’m not sure about why the latter part (bolded) is true - how do bottleneck layers work to accomplish this? In the paper, they mention “linear bottleneck layers” - which I guess applies some linear transformation? [1]