Andrew mentions that the 1*1 convolution is helpful in reducing the number of channels, but why can’t we just use a normal convolution layer with ‘same’ padding
Hey, this is a good question! Indeed if we use ‘same’ padding and #number of filters we can get a new layer of the same size and #number of filters. However, there is another important reason for using 1x1 convolutions as you will learn in the next few videos, which is that it is computationally a lot less cheap than a 3x3 or a 5x5 convolutions which are pretty computationally heavy. Hence for developing deeper architectures, when we want to run 5x5 or 3x3 CONV layers, a 1x1 CONV is often seen before to reduce the number of channels to reduce computation for 3x3 and 5x5 convolutions.
So to answer your question, yes we can reduce channels the way you describe, but the REASON we even want to reduce channels is to speed up 3x3/5x5 CONV layers because they are slow. Hence if we used them to reduce channels, it would make no difference at all/worsen performance than not reducing channels at all.
input(256 channels)->1x1CONV (64 depth)->4x4 CONV(256 depth)
input(256 channels)->4x4 CONV (256 depth)
The top one is 3.7 times faster. Note, we added the 1x1 to reduce channels to speed up subsequent 4x4.
hey, thanks a lot for the reply this made a lot of sense. And in the later videos of inception, I got a much better intuition of just how powerful this method can be.
Anytime! The feeling of when it clicks really is the best.