there was a concept of a bottle neck layer to reduce the computation in inception nets. the example used 1x1 conv to reduce the dim of n_C from 192 to 16 then used conv 5x5 with 32 filters to increase it again to n_C = 32. why does he used n_C = 16 not 32 or any another number. i get that he started 192 and end goal was 32 so that’s fine by me but the bottle neck layer n_C can be anything how to choose it depends on what. Thank you
Its an interesting question, normally you would choose a power of 2 because it resembles the digital systems (binary). Its a bottleneck so it has to be less than 32, can be 16, 8, 4, but the lower this number the less trainable parameters you have so I guess the 16 is an optimal choice…(these are my thoughts).
Well maybe the point is that lowering computational cost is not the only goal here. You are losing information if you make the bottleneck layer too small. If you make it super cheap, but don’t have enough information for the “reinflation” layer to learn all the details it needs, then you don’t get a good enough result.
That would be my interpretation of Gent’s statement here:
Like all hyperparameter choices in ML/DL, you have to experiment to find the “Goldilocks” values. They must have tried some experiments and decided that 16 is “just right” or “close enough”.
But we are doing science here, so if you are ever using an Inception Network to solve a real problem, you can run this experiment yourself and see what happens with your datasets if you use 8 or 4 as the number of filters in the bottleneck layer instead of 16.